Math III · S-IC.5

Using Randomized Experiments and Simulations to Compare Treatments

Randomized experiments help students decide whether an observed treatment difference is likely real or could plausibly be due to chance assignment.

Concept Statistics and Probability
Domain Making Inferences and Justifying Conclusions
Read time 5 minutes

What this learning objective is really asking you to learn

This objective asks students to use randomized-experiment data and simulations to compare treatments and judge significance. A randomized experiment assigns subjects to treatment groups by chance, then compares outcomes. Random assignment helps make the groups similar before treatment so that differences afterward can more credibly be attributed to the treatment.

For example, suppose 100 students are randomly assigned to use Study Method A or Study Method B. After two weeks, Method A students improve by an average of 8 points, while Method B students improve by an average of 3 points. The observed difference is 5 points. Is that difference meaningful evidence that Method A is better, or could a difference that large happen just from random assignment?

Simulation helps answer that question. Under a “no treatment effect” model, the labels A and B would not matter. The observed outcomes could be randomly shuffled into two groups many times. For each shuffle, compute the difference in group means. This creates a distribution of differences expected by chance assignment alone. If the observed difference is far in the tail of that distribution, it is statistically significant evidence of a treatment effect.

The objective is not about memorizing a formal hypothesis-test procedure. It is about understanding the logic:

  • random assignment creates comparable groups;
  • chance still creates some difference between groups;
  • simulation shows how large differences typically are under no effect;
  • an unusually large observed difference is evidence that the treatment may matter.

Students should also understand that statistical significance is not the same as practical importance. A tiny effect can be statistically significant in a huge study, while a large effect in a small study may not be statistically conclusive.

Why students should learn this math

Students should learn this because treatment comparisons are everywhere. Does a medicine work better than a placebo? Does a tutoring program improve scores? Does a new app interface increase completion? Does a fertilizer increase crop yield? Does a training method improve performance? Does a public policy reduce accidents?

A randomized experiment is one of the strongest tools for causal evidence. If subjects are randomly assigned, then known and unknown confounding variables tend to be balanced across treatment groups. This makes it more plausible that outcome differences are caused by the treatment.

But random assignment does not mean groups will be perfectly identical. Some differences happen by chance. Simulation helps students separate ordinary random imbalance from evidence of a real effect.

This objective also helps students interpret scientific and medical claims. A study may report a difference between treatment and control groups. The key question is: was the difference large relative to what random assignment alone might produce? That is the meaning of statistical significance.

The “why” is that experiments produce evidence, but evidence must be judged against chance variation. Simulation makes that judgment visible.

The historical machinery: randomized experiments and significance

Randomized experiments became central to modern science because they address confounding. In medicine, agriculture, psychology, education, and product testing, random assignment helps isolate treatment effects. R.A. Fisher and others developed formal methods for using randomization to judge whether observed differences were likely due to chance.

Simulation and randomization tests are intuitive versions of this logic. If treatment labels are randomly assigned, then under no treatment effect, rearranging labels should produce differences similar to those possible by chance. Comparing the observed difference to this randomization distribution gives evidence.

With computers, randomization simulations became practical and easy to visualize. Students can now learn inference by seeing the distribution of chance differences rather than starting with abstract formulas.

The historical lesson is that significance is about comparing observed effects to a chance model.

Where this fits in the big map of mathematics

This objective follows margins of error and study design. It focuses specifically on experiments and treatment comparisons.

It connects to random assignment from Objective 181.

It connects to simulation from Objective 180.

It connects to causation. Randomized experiments can support causal conclusions more strongly than observational studies.

It connects to probability because random assignment produces a distribution of possible group differences.

It connects to report evaluation in Objective 184.

The big-map role is experimental evidence. Students learn how randomized data can support treatment comparisons.

How to execute the skill technically

A simulation-based treatment comparison process:

  1. Identify the treatments.
  2. Identify the response variable.
  3. Compute the observed difference in outcomes.
  4. State the no-effect model.
  5. Simulate random assignment many times under no effect.
  6. Compute simulated differences.
  7. Compare the observed difference to the simulation distribution.
  8. Decide whether the observed difference is typical or surprising.
  9. Interpret in context.

Example: 20 plants are randomly assigned to Fertilizer A or Fertilizer B. Mean growth is 14 cm for A and 10 cm for B. Observed difference is 4 cm.

Simulation under no effect: pool all growth values, randomly assign 10 to A and 10 to B many times, compute difference in means each time.

If differences of 4 cm or more occur in only 1% of simulations, the result is statistically significant evidence that Fertilizer A produces greater growth. If such differences occur in 25% of simulations, the result is not very surprising under random assignment.

Practical versus statistical significance

Suppose a large study finds that a new website design increases average time on page by 0.2 seconds, and the difference is statistically significant. Is that practically important? Maybe not. Statistical significance means the effect is unlikely to be due to chance alone under the model. Practical significance asks whether the effect is large enough to matter.

Students should learn both questions:

  • Is the difference likely real?
  • Is the difference important?

More detail: randomization distribution

A randomization distribution shows what treatment differences would look like if treatment labels did not matter. To build it, keep the observed outcomes fixed, randomly shuffle treatment labels many times, and compute the treatment difference each time. This produces the distribution of differences expected from random assignment alone.

The observed difference is then compared to this distribution. If the observed difference is near the center, it is typical under no effect. If it is far out in the tail, it is surprising and may be evidence of a treatment effect.

This is simulation-based significance. It makes the logic of inference visible.

Example: two teaching methods

Twenty students are randomly assigned to Method A or Method B. The average improvement is 12 points for A and 7 points for B, so the observed difference is 5 points. To test whether this is surprising, simulate many random reassignments of the 20 improvement scores into two groups of 10. Compute the difference in means each time.

If differences of 5 or more occur in 40 out of 1,000 simulations, the simulated tail proportion is 0.04. That suggests the observed result would be fairly unusual if there were no treatment effect. This is evidence that Method A may be better.

If differences of 5 or more occur in 300 out of 1,000 simulations, the observed difference is not unusual under random assignment. The data do not provide strong evidence.

Causation requires design

If the groups were not randomly assigned, a simulation of random assignment may not match the actual study design. Students must not use experimental inference language for observational data. A treatment comparison supports causation only when the design supports causation.

Problem Library

Problems in the App From This Objective

153 problems across 12 archetypes in the app.

parse experimental design.
12 problems Warmup Practice Mixed Review Assessment
Problem 1

Identify treatment, control, and response variable in experiment patients randomly receive new drug or placebo; pain score recorded.

Problem 2

Identify treatment, control, and response variable in experiment classes use new curriculum or standard curriculum; test gain measured.

Problem 3

Identify treatment, control, and response variable in experiment product A vs product B assigned; satisfaction recorded.

Problem 4

Identify treatment, control, and response variable in experiment volunteers randomly assigned to exercise program or no exercise; blood pressure measured.

Problem 5

Identify treatment, control, and response variable in experiment students taught with interactive software or traditional lectures; exam scores compared.

Problem 6

Identify treatment, control, and response variable in experiment plants treated with fertilizer A, fertilizer B, or no fertilizer; yield measured.

Problem 7

Identify treatment, control, and response variable in experiment website visitors see layout A or layout B; click-through rate analyzed.

Problem 8

Identify treatment, control, and response variable in experiment participants consume high-protein diet or standard diet; muscle mass change observed.

Problem 9

Identify treatment, control, and response variable in experiment children attend preschool with play-based learning or academic focus; social skills assessed.

Problem 10

Identify treatment, control, and response variable in experiment two different engine designs tested; fuel consumption recorded.

Open in simulator
Problem 11

Identify treatment, control, and response variable in experiment customers receive discount code via email or text message; redemption rate tracked.

Problem 12

Identify treatment, control, and response variable in experiment forest plots exposed to different levels of air pollution; tree growth measured.

compare means or proportions.
12 problems Warmup Practice Mixed Review Assessment
Problem 13

Compute difference in treatment outcomes for treatment mean 18, control mean 14.

Problem 14

Compute difference in treatment outcomes for treatment success 62 percent, control success 49 percent.

Problem 15

Compute difference in treatment outcomes for control mean 22, treatment mean 19.

Problem 16

Compute difference in treatment outcomes for treatment group average score 85, control group average score 78.

Problem 17

Compute difference in treatment outcomes for control group average weight loss 5 kg, treatment group average weight loss 3 kg.

Problem 18

Compute difference in treatment outcomes for control group recovery rate 70%, treatment group recovery rate 85%.

Problem 19

Compute difference in treatment outcomes for treatment group side effect rate 25%, control group side effect rate 30%.

Problem 20

Compute difference in treatment outcomes for average response time for treated patients 16 minutes, for untreated patients 12 minutes.

Problem 21

Compute difference in treatment outcomes for treatment group average yield 45 bushels, control group average yield 50 bushels.

Open in simulator
Problem 22

Compute difference in treatment outcomes for proportion of successful outcomes in treatment group 0.75, in control group 0.60.

Problem 23

Compute difference in treatment outcomes for proportion of adverse events in treatment group 0.10, in control group 0.18.

Problem 24

Compute difference in treatment outcomes for treatment group average blood pressure reduction 15 mmHg, control group average blood pressure reduction 5 mmHg.

shuffle labels under no-treatment-effect model.
15 problems Warmup Practice Mixed Review Assessment
Problem 25

Design randomization simulation for treatment comparison 10 outcomes, 5 treatment labels, compare means.

Problem 26

Design randomization simulation for treatment comparison 20 binary outcomes randomly assigned 10 per group.

Problem 27

Design randomization simulation for treatment comparison observed treatment effect under no-effect model.

Problem 28

Design randomization simulation for treatment comparison 100 test scores, 50 assigned to a new teaching method, compare mean scores.

Problem 29

Design randomization simulation for treatment comparison 30 patients, 15 received a new drug, 15 received placebo, observe recovery (binary).

Problem 30

Design randomization simulation for treatment comparison two groups of 8 students each, comparing total points scored on a task.

Problem 31

Design randomization simulation for treatment comparison A/B test with 200 users, 100 in each group, conversion rate comparison.

Problem 32

Design randomization simulation for treatment comparison 12 coin flips, 6 with a 'special' coin, compare proportion of heads.

Open in simulator
Problem 33

Design randomization simulation for treatment comparison 24 plant growth measurements, 12 treated with fertilizer, compare medians.

Problem 34

Design randomization simulation for treatment comparison 2 groups of 15 each, comparing average reaction times.

Problem 35

Design randomization simulation for treatment comparison Survey of 500 people, 250 exposed to an ad, compare 'purchase intent' proportions.

Problem 36

Design randomization simulation for treatment comparison comparison of 2 treatments on 40 subjects, 20 per treatment, measuring a continuous outcome.

Problem 37

Design randomization simulation for treatment comparison evaluating the p-value for an observed difference in means between two groups of 10.

Problem 38

Design randomization simulation for treatment comparison 30 observations, 10 in treatment group, 20 in control group, comparing means.

Problem 39

Design randomization simulation for treatment comparison 25 attempts, 10 using a new method, compare success rates.

compare observed difference to simulated differences.
12 problems Warmup Practice Mixed Review Assessment
Problem 40

Interpret randomization distribution simulated differences centered near 0, observed 5 in far right tail.

Problem 41

Interpret randomization distribution observed difference near center of simulated differences.

Problem 42

Interpret randomization distribution wide simulation spread.

Problem 43

Interpret randomization distribution simulated differences centered at 0, observed -4 in far left tail.

Problem 44

Interpret randomization distribution observed difference of 0.2, which is very close to the center of the simulated differences around 0.

Problem 45

Interpret randomization distribution simulated differences centered at 0, observed 3.5 in the right tail, with only 1% of simulated differences greater than 3.5.

Problem 46

Interpret randomization distribution simulated differences have a wide spread from -10 to 10, observed difference is 0.5.

Problem 47

Interpret randomization distribution simulated differences are tightly clustered around 0, observed difference is 1.5 in the right tail.

Problem 48

Interpret randomization distribution observed difference of 1.2, which falls within the middle 80% of simulated differences centered at 0.

Problem 49

Interpret randomization distribution simulated differences centered at 0, observed -2.8 in the left 5% tail.

Problem 50

Interpret randomization distribution simulated differences show a range of outcomes due to random assignment, observed difference is 0.1.

Open in simulator
Problem 51

Interpret randomization distribution simulated differences centered at 0, observed 2.1 in the upper 10% tail.

count simulated outcomes at least as extreme.
15 problems Warmup Practice Mixed Review Assessment
Problem 52

Estimate p-value from simulation results 12 of 1000 simulated differences at least as large.

Problem 53

Estimate p-value from simulation results 38 of 500 at least as extreme two-sided.

Problem 54

Estimate p-value from simulation results 0 of 200 at least as extreme.

Open in simulator
Problem 55

Estimate p-value from simulation results 50 of 1000 simulated differences at least as large.

Problem 56

Estimate p-value from simulation results 15 of 500 at least as extreme two-sided.

Problem 57

Estimate p-value from simulation results 200 of 2000 at least as extreme.

Problem 58

Estimate p-value from simulation results 7 of 100 simulated differences at least as small.

Problem 59

Estimate p-value from simulation results 1 of 1000 at least as extreme.

Problem 60

Estimate p-value from simulation results 250 of 5000 at least as extreme two-sided.

Problem 61

Estimate p-value from simulation results 3 of 200 at least as extreme.

Problem 62

Estimate p-value from simulation results 100 of 1000 at least as large.

Problem 63

Estimate p-value from simulation results 45 of 900 at least as extreme.

Problem 64

Estimate p-value from simulation results 90 of 3000 at least as extreme two-sided.

Problem 65

Estimate p-value from simulation results 6 of 400 at least as extreme.

Problem 66

Estimate p-value from simulation results 120 of 6000 at least as extreme.

judge unusualness under random assignment.
12 problems Warmup Practice Mixed Review Assessment
Problem 67

Decide informal statistical significance from p-value 0.01.

Problem 68

Decide informal statistical significance from p-value 0.28.

Problem 69

Decide informal statistical significance from observed statistic in extreme simulation tail.

Problem 70

Decide informal statistical significance from p-value 0.005.

Problem 71

Decide informal statistical significance from p-value 0.15.

Problem 72

Decide informal statistical significance from observed difference falls in the most extreme 2% of the randomization distribution.

Problem 73

Decide informal statistical significance from observed difference is near the center of the randomization distribution.

Problem 74

Decide informal statistical significance from p-value 0.05.

Problem 75

Decide informal statistical significance from p-value 0.06.

Open in simulator
Problem 76

Decide informal statistical significance from p-value less than 0.001.

Problem 77

Decide informal statistical significance from observed statistic is within the middle 80% of simulated outcomes.

Problem 78

Decide informal statistical significance from observed test statistic is in the upper 1% tail of the null distribution.

connect simulation result to treatment comparison.
15 problems Warmup Practice Mixed Review Assessment
Problem 79

Explain significance in context new method mean gain 6 points higher, p≈0.02.

Problem 80

Explain significance in context drug success 8 percentage points higher, p≈0.35.

Problem 81

Explain significance in context treatment lower outcome with small p-value.

Problem 82

Explain significance in context new fertilizer increased yield by 15 kg/hectare, p=0.001.

Open in simulator
Problem 83

Explain significance in context study group improved test scores by 2 points, p=0.15.

Problem 84

Explain significance in context new drug reduced recovery time by 3 days, p<0.01.

Problem 85

Explain significance in context diet program led to 0.5 kg weight loss, p=0.45.

Problem 86

Explain significance in context new marketing campaign increased customer conversion rate by 5%, p=0.03.

Problem 87

Explain significance in context software update decreased error rate by 1%, p=0.28.

Problem 88

Explain significance in context experimental group scored 10 points higher on average, p=0.005.

Problem 89

Explain significance in context control group's pain score was 1 point lower, p=0.60.

Problem 90

Explain significance in context new training reduced task completion time by 2 minutes, p=0.008.

Problem 91

Explain significance in context modified process took 30 seconds longer, p=0.18.

Problem 92

Explain significance in context treatment group showed a notable difference, p=0.04.

Problem 93

Explain significance in context no significant difference observed, p=0.55.

compare effect size and context.
12 problems Warmup Practice Mixed Review Assessment
Problem 94

Distinguish statistical significance from practical importance for huge study finds 0.1 point score increase with p<0.001.

Problem 95

Distinguish statistical significance from practical importance for small study finds 8 point increase but p=0.12.

Open in simulator
Problem 96

Distinguish statistical significance from practical importance for large effect and small p-value.

Problem 97

Distinguish statistical significance from practical importance for a study of 10,000 patients shows a new drug reduces blood pressure by 10 mmHg with p<0.001.

Problem 98

Distinguish statistical significance from practical importance for a pilot study of 20 patients finds a new therapy improves mood by 5 points with p=0.25.

Problem 99

Distinguish statistical significance from practical importance for a massive survey of 100,000 people finds a 0.05 point difference in happiness scores with p=0.06.

Problem 100

Distinguish statistical significance from practical importance for a clinical trial with 30 participants shows a new treatment cures 80% of cases with p=0.04.

Problem 101

Distinguish statistical significance from practical importance for a meta-analysis of 50 studies reveals a new teaching method increases test scores by 0.02 standard deviations with p<0.0001.

Problem 102

Distinguish statistical significance from practical importance for a study of 200 students finds a new curriculum improves grades by 3% with p=0.08.

Problem 103

Distinguish statistical significance from practical importance for a randomized controlled trial with 5,000 participants shows no difference between two diets with p=0.78.

Problem 104

Distinguish statistical significance from practical importance for a proof-of-concept experiment with 15 samples shows a new catalyst increases reaction yield by 50% with p<0.01.

Problem 105

Distinguish statistical significance from practical importance for a large-scale intervention study with 2,000 participants finds a 5% reduction in disease incidence with p=0.07.

connect randomization to balanced groups.
12 problems Warmup Practice Mixed Review Assessment
Problem 106

Identify role of random assignment in causal claim subjects randomly assigned to tutoring or control.

Problem 107

Identify role of random assignment in causal claim patients choose drug or placebo.

Problem 108

Identify role of random assignment in causal claim random labels in experiment.

Open in simulator
Problem 109

Identify role of random assignment in causal claim students randomly assigned to a new teaching method or a traditional one.

Problem 110

Identify role of random assignment in causal claim farmers randomly assigned to use a new pesticide or a placebo.

Problem 111

Identify role of random assignment in causal claim volunteers randomly assigned to a meditation program or a control activity.

Problem 112

Identify role of random assignment in causal claim randomly assigning different types of packaging to products sold in stores.

Problem 113

Identify role of random assignment in causal claim randomly assigning different exercise routines to participants in a fitness study.

Problem 114

Identify role of random assignment in causal claim randomly assigning two groups of mice to different diets.

Problem 115

Identify role of random assignment in causal claim randomly assigning different website layouts to visitors.

Problem 116

Identify role of random assignment in causal claim randomly assigning different types of feedback to students on their essays.

Problem 117

Identify role of random assignment in causal claim randomly assigning different types of music to workers in a factory.

consider sample size, blinding, attrition, generalization.
12 problems Warmup Practice Mixed Review Assessment
Problem 118

Evaluate limitations of randomized experiment small randomized trial with 12 subjects.

Problem 119

Evaluate limitations of randomized experiment many subjects drop out unevenly.

Problem 120

Evaluate limitations of randomized experiment volunteer participants randomly assigned.

Problem 121

Evaluate limitations of randomized experiment no blinding and subjective response.

Problem 122

Evaluate limitations of randomized experiment randomized trial involving only male participants.

Problem 123

Evaluate limitations of randomized experiment study where only participants are blinded, but researchers know treatment assignments.

Problem 124

Evaluate limitations of randomized experiment randomized trial with 20 subjects per group.

Problem 125

Evaluate limitations of randomized experiment participants in the control group drop out at a higher rate.

Open in simulator
Problem 126

Evaluate limitations of randomized experiment randomized experiment conducted in a highly controlled laboratory setting.

Problem 127

Evaluate limitations of randomized experiment randomized trial of a surgical procedure where blinding is impossible.

Problem 128

Evaluate limitations of randomized experiment randomized study with a very diverse but small sample.

Problem 129

Evaluate limitations of randomized experiment randomized experiment using a convenience sample from a specific community.

interpret p-values/effect sizes.
12 problems Warmup Practice Mixed Review Assessment
Problem 130

Compare two treatment-effect claims Treatment A effect 5 with p=0.01; B effect 6 with p=0.20.

Problem 131

Compare two treatment-effect claims A small effect p<0.001, B larger effect p=0.04.

Problem 132

Compare two treatment-effect claims one claim from randomized experiment, one from observational study.

Problem 133

Compare two treatment-effect claims Treatment X effect 10 with p=0.05, Treatment Y effect 2 with p=0.04.

Problem 134

Compare two treatment-effect claims Treatment C effect 8 with p=0.001, Treatment D effect 9 with p=0.50.

Problem 135

Compare two treatment-effect claims Treatment E effect 12 with p<0.0001, Treatment F effect 10 with p=0.0002.

Problem 136

Compare two treatment-effect claims Treatment G 95% CI [3, 7], Treatment H 95% CI [-1, 10].

Problem 137

Compare two treatment-effect claims Treatment I effect 7 with p=0.03 from a small pilot study (n=30), Treatment J effect 6 with p=0.08 from a large well-designed trial (n=500).

Open in simulator
Problem 138

Compare two treatment-effect claims Treatment K effect -3 with p=0.005, Treatment L effect 2 with p=0.30.

Problem 139

Compare two treatment-effect claims Treatment M effect 15 with p=0.15, Treatment N effect 2 with p=0.10.

Problem 140

Compare two treatment-effect claims Treatment P effect 4 with p=0.02 from a double-blind study, Treatment Q effect 5 with p=0.01 from a non-blinded study.

Problem 141

Compare two treatment-effect claims Treatment R effect 10 with p=0.001, Treatment S effect -2 with p=0.002.

catch causation, p-value, practical-significance, and randomization mistakes.
12 problems Warmup Practice Mixed Review Assessment
Problem 142

Correct significance-testing interpretation error p=0.03 means 3 percent chance no treatment effect is true.

Problem 143

Correct significance-testing interpretation error not significant means treatments are equal.

Problem 144

Correct significance-testing interpretation error randomized experiment result generalizes to all adults automatically.

Problem 145

Correct significance-testing interpretation error statistically significant means practically important.

Problem 146

Correct significance-testing interpretation error A very small p-value (e.g., p < 0.001) indicates a very large and important effect.

Problem 147

Correct significance-testing interpretation error Because the study found a statistically significant association between diet and heart disease, we can conclude that the diet causes heart disease.

Problem 148

Correct significance-testing interpretation error If a study has a p-value of 0.01, it means that if the experiment were repeated, there's a 99% chance of getting a significant result again.

Problem 149

Correct significance-testing interpretation error Since the participants were randomly selected, we can infer that the treatment caused the observed outcome.

Problem 150

Correct significance-testing interpretation error A p-value of 0.20 means the null hypothesis is true.

Problem 151

Correct significance-testing interpretation error With a very large sample size, any statistically significant result is also practically important.

Problem 152

Correct significance-testing interpretation error A 95% confidence interval for the mean means there is a 95% chance that the true population mean falls within this specific interval.

Problem 153

Correct significance-testing interpretation error If we set alpha to 0.05, it means there's a 5% chance of making a Type II error.

Open in simulator