Welcome to the Data Generator

The purpose of this program is to simulate data for hypothetical experiments. The program is written in the Python programming language and thus uses some syntax specific input. Be sure to read the instructions carefully.  Click HERE to open the Data Generator.

How to use this program for univariate or factorial Designs:
This program was designed to model specific hypothetical patterns of cell and marginal means rather than outcomes in terms of null-hypothesis statistical testing. That means (pun intended) that it is a good idea to map out exactly what you expect the data to look like before trying to generate the data. For example, Let’s say my experiment has been designed to determine whether the effects of medication dosage on measures of reaction time and impulsivity are different for men and women. If there are three levels of medication dosage and two levels of gender, then an example of my setup might be as follows:

Between-participant IVs:
Medication Dosage (‘MD’) with three levels, ’15mg’, ’30mg’, ’60mg’
Gender (‘Gender’) with two levels, ‘male’ and ‘female’

Within-participant IVs:
NONE

Dependent Variables
Reaction Time (‘RT’) – continuous values, mean=1200ms, sd=100
Impulsivity (‘IMP’) – interval values, mean=25, sd=5

Main Effects in Reaction Time
Gender –> Male 100ms slower (i.e., longer) than Female
MD –> No main effect

Main Effects in Impulsivity
Gender –> No main effect
MD –> people taking 15mg will be 5 units more impulsive than 30mg, which will be equal to 60mg on impulsivity.

Gender X MD Interaction Effects in RT
Reaction time in males will decrease by 100ms with each increase in dosage, but there will be no change in RT for females.

Gender X MD Interaction Effects in IMP
Impulsivity will decrease in female participants by 10 for those taking 30mg and 60mg dosage, but impulsivity will be unaffected by dosage in men.

Correlations Between Dependent Variables
Because higher impulsivity should be linked with rapid responses, it is expected that there will be a negative (e.g., -.4) correlation between RT and IMP.

Notice that the expected outcomes have been described in terms of changes across levels of the IVs (expressed in DV units) and not in terms of whether or not the result will be statistically significant. Whether or not a result is significant will be determined, in large part, by the chosen standard deviation in the measures of the DVs. For example, here I have designated a standard deviation in RT of 100ms. Given that I also designated that Males will be (on average) 100ms slower (a full standard deviation), that difference will likely turn out to be statistically significant when the simulated data are analyzed.

The reason it is so important to think about your experiment in this way is that once you define your variables in the data generator, you will be asked a series of questions about the variables and the expected effects.