Statistics - Designing Studies and Sampling
Understand the difference between census and sampling, how to design experiments using randomization and blocking, and the contrast between experimental and observational studies.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the primary goal of representative sampling?
1 of 7
Summary
Data Collection and Sampling
Why Sampling Matters: Census vs. Sample
In statistics, we often face a practical problem: we want to understand a large population, but examining every member is impossible, too expensive, or too time-consuming. This is where sampling becomes essential.
A census is a measurement of an entire population. A sample is a subset of the population selected for measurement. When a full census is impractical—which is almost always—statisticians collect data from carefully chosen samples. The key challenge is ensuring that conclusions drawn from the sample actually tell us something true about the entire population.
Representative Sampling: The Bridge from Sample to Population
The core idea of representative sampling is straightforward but crucial: the sample should reflect the characteristics of the population so that we can safely extend our findings from the sample back to the population.
When a sample is representative, statistical conclusions based on the sample become reliable inferences about the entire population. This is why the method of selecting the sample matters enormously—a poorly chosen sample can lead to completely false conclusions.
Sampling theory, which is part of probability theory, studies how sample statistics behave. It tells us how much variability to expect in our measurements when we repeatedly sample from the same population. This variability is captured in what we call sampling distributions.
Sampling Theory vs. Statistical Inference: Two Directions of Reasoning
This is a key conceptual distinction that often confuses students, so let's be very clear about it.
Classical probability theory works forward from the population to the sample. It asks: "If I know the population has certain properties, what can I predict about samples I might draw?" For example: "If a coin is fair (50% heads), what's the probability of getting 7 or more heads in 10 flips?"
Statistical inference works backward from the sample to the population. It asks: "If I observe certain properties in my sample, what can I conclude about the population?" For example: "I flipped a coin 10 times and got 8 heads. Should I conclude the coin is biased?"
These are complementary approaches. Sampling theory (probability) gives us the foundation for inference—it tells us what to expect from fair samples, which helps us recognize when our sample data seems unusual.
Experimental vs. Observational Studies: Two Different Research Approaches
Not all data collection is the same. The way you collect data fundamentally determines what questions you can answer.
Experimental studies involve actively manipulating the system being studied. The basic structure is: measure baseline conditions → apply treatment/manipulation → measure again. By deliberately causing change, we can assess what effect the manipulation had. For instance, researchers might divide students into groups, teach one group using a new method while teaching the other using traditional methods, then compare test scores.
Observational studies involve no manipulation. Researchers simply gather data about existing conditions to investigate correlations between predictors (variables you think might influence outcomes) and responses (the outcomes you care about). For example, researchers might survey people about their exercise habits and health outcomes, but they don't assign people to exercise or not—they just observe and record patterns.
This distinction matters because it affects what conclusions you can draw. An experimental study where you control treatments can reveal cause-and-effect relationships. An observational study shows you correlations, but cannot definitively prove causation—there might be hidden factors influencing both variables.
Design and Conduct of Experiments
Planning an Experiment: Before You Begin
Before you run an experiment, careful planning is essential. Several decisions affect whether your experiment will give you reliable answers:
Determining the number of replicates is critical. How many times should you repeat each treatment? This depends on three things:
Preliminary estimates of treatment effects—how large a difference are you expecting?
Your alternative hypothesis—what outcome would convince you your treatment works?
Experimental variability—how much "noise" or random variation exists in your measurements?
If you expect huge treatment effects with low variability, you need fewer replicates. If you expect subtle effects with high variability, you need many more observations to detect the signal through the noise.
Beyond the numbers, you must also consider subject selection (who or what will you study?) and ethical concerns (is it safe and fair to impose your treatment?).
Experimental Design Principles: Creating Fair Tests
Two fundamental principles guide good experimental design: blocking and randomized assignment. Both address the core threat to experimental validity: confounding variables.
Blocking: Controlling What You Can
A confounding variable is a factor that varies along with your treatment but isn't part of what you're trying to test. It clouds your ability to see the true effect of your treatment.
Blocking is a technique to reduce confounding. The idea: group subjects into blocks based on variables you know will affect your outcome, then apply treatments within each block.
Example: If you're testing whether fertilizer increases plant growth, you might block by initial plant size. You'd create blocks of small plants, medium plants, and large plants, then apply different fertilizers within each block. This way, if plants grow differently regardless, you separate that from the fertilizer effect.
Randomized Assignment: Eliminating Bias
Even with blocking, you need to decide which subjects get which treatment. Randomized assignment means you use random selection (like drawing names from a hat) to decide which subjects receive which treatment, rather than letting researchers choose.
Why does randomization matter? Consider the alternative: if researchers assign subjects based on judgment, they might unconsciously give the promising treatment to the healthiest subjects or the ones they think will do well. This introduces bias that hides what the treatment actually does.
Randomization achieves two critical goals:
Unbiased estimates: On average, treatment groups become comparable, so differences in outcomes reflect treatment effects, not pre-existing differences
Accurate error estimation: Randomization creates the variability pattern assumed by statistical tests, so we can trust our calculations of experimental error
<extrainfo>
The Hawthorne Study: A Cautionary Tale
One famous example demonstrates how subtle the effects of experimental design can be. The Hawthorne Study was conducted at a Western Electric factory to investigate how lighting affected worker productivity. Researchers increased lighting in one group's work area. Productivity went up.
Then they decreased the lighting. Productivity went up again.
The unexpected finding: workers' productivity increased simply because they knew they were being observed. This phenomenon became known as the Hawthorne effect—the tendency of people to change their behavior when they know they're being studied, regardless of what the actual treatment is.
This example reminds us that conducting good experiments requires attention not just to treatments, but to all aspects of how subjects experience the study.
</extrainfo>
Flashcards
What is the primary goal of representative sampling?
To ensure inferences from the sample can be safely extended to the whole population
What specific area of probability theory studies the sampling distributions of sample statistics?
Sampling theory
How does classic probability theory differ from statistical inference in its approach to population parameters?
Classic probability deduces sample probabilities from known population parameters, while statistical inference inductively infers population parameters from sample observations
What type of study involves manipulating a system and taking measurements to assess the effect of that manipulation?
Experimental study
What is the primary objective of an observational study?
To investigate correlations between predictors and responses without manipulation
What is the purpose of applying randomized assignment of treatments in an experiment?
To obtain unbiased estimates of treatment effects and experimental error
How does the Hawthorne effect describe the behavior of research participants?
Participants change their behavior simply because they are being observed
Quiz
Statistics - Designing Studies and Sampling Quiz Question 1: What does classic probability theory assume when calculating probabilities about samples?
- Population parameters are known (correct)
- Sample parameters are known
- Both population and sample parameters are unknown
- Only sample estimates are used
Statistics - Designing Studies and Sampling Quiz Question 2: What is the primary goal of using blocking in an experimental design?
- To reduce the impact of confounding variables (correct)
- To increase the total number of treatments tested
- To randomize the assignment of subjects to treatments
- To eliminate the need for replicates
What does classic probability theory assume when calculating probabilities about samples?
1 of 2
Key Concepts
Sampling and Inference
Census
Representative sampling
Sampling theory
Statistical inference
Research Methods
Experimental study
Observational study
Experimental design
Blocking (statistics)
Randomized assignment
Hawthorne effect
Definitions
Census
A complete enumeration of every member of a target population, often impractical for large groups.
Representative sampling
A sampling method that yields a sample reflecting the characteristics of the entire population, allowing valid generalizations.
Sampling theory
The branch of probability theory that studies the distribution of sample statistics and the properties of estimators derived from samples.
Statistical inference
The process of drawing conclusions about population parameters based on data collected from a sample.
Experimental study
Research in which the investigator manipulates one or more variables and measures the effect of this manipulation on outcomes.
Observational study
Research that involves measuring variables without any manipulation, used to assess associations and correlations.
Blocking (statistics)
An experimental design technique that groups similar experimental units together to reduce the impact of confounding variables.
Randomized assignment
The practice of allocating experimental units to treatment groups by chance to ensure unbiased estimates of treatment effects.
Hawthorne effect
A phenomenon where individuals modify their behavior simply because they are aware they are being observed.
Experimental design
The structured planning of experiments, including decisions about treatments, controls, replication, randomization, and blocking, to obtain reliable and valid results.