RemNote Community
Community

Variance - Core Concepts and Computation

Understand the definition, computation methods, and key properties of variance, including population vs. sample formulas and how it behaves under scaling and addition.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the conceptual definition of variance in terms of expected value?
1 of 24

Summary

Understanding Variance What is Variance? Variance measures how spread out a set of values is from their average. Imagine measuring heights in a classroom: if all students are roughly the same height, the variance is small. If heights range from very short to very tall students, the variance is large. More formally, variance is the expected value of the squared deviation of a random variable from its mean. We square the deviations to ensure that distances above and below the mean are treated equally (and to eliminate the problem of deviations canceling each other out). The standard deviation is simply the positive square root of variance. We use standard deviation in practice because it's in the same units as the original data, while variance is in squared units. This visualization shows two distributions with the same average (100) but different standard deviations. The narrower red distribution (SD = 10) has lower variance, while the broader blue distribution (SD = 50) has higher variance. Notice how the red distribution clusters tightly around the mean, while the blue one spreads out much more. The Mathematical Definition For a random variable $X$ with mean $\mu = \operatorname{E}[X]$, variance is defined as: $$\operatorname{Var}(X) = \operatorname{E}\big[(X - \mu)^2\big]$$ This formula captures our intuition: we measure how far each value is from the mean (the deviation $X - \mu$), square it, and then take the average of those squared deviations. However, in practice, we often use an alternative formula that's easier for computation: $$\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2$$ This version says that variance equals the expected value of the square minus the square of the expected value. You can verify these are equivalent algebraically, and the second form is often faster to calculate. Population Variance vs Sample Variance: A Critical Distinction This is where many students get confused, so pay close attention. Population Variance When you have the entire population of values available, the population variance is: $$\sigma^2 = \frac{1}{N}\sum{i=1}^{N}(xi - \mu)^2$$ where $N$ is the number of observations and $\mu$ is the population mean. You simply divide by $N$—the total count. Sample Variance: Why Do We Divide by n−1? In reality, we usually work with a sample drawn from a larger population. If we naively computed variance using the same formula (dividing by $n$), we would systematically underestimate the true population variance. This is because the sample mean $\bar{x}$ is always closer to the individual sample values than the true population mean $\mu$ is. Our deviations from the sample mean are artificially small. To fix this problem, statisticians use Bessel's correction: divide by $n-1$ instead of $n$. This gives the unbiased sample variance: $$s^2 = \frac{1}{n-1}\sum{i=1}^{n}(xi - \bar{x})^2$$ where $\bar{x}$ is the sample mean and $n$ is the sample size. Why $n-1$ specifically? Because the sample mean itself was calculated from the data, we've "used up" one degree of freedom. We have $n$ observations, but only $n-1$ independent deviations (the last one is determined by the constraint that all deviations must sum to zero). Dividing by $n-1$ corrects the bias introduced by using the sample mean instead of the population mean. The biased version (dividing by $n$) would underestimate the population variance by the factor $(n-1)/n$—a small correction that becomes negligible as $n$ grows, but crucial for small samples. In practice: Always use $s^2 = \frac{1}{n-1}\sum{i=1}^{n}(xi - \bar{x})^2$ when estimating variance from sample data. Most software defaults to this. Key Properties of Variance Non-negativity and Zero Variance Variance is always non-negative, since we're summing squared terms. The only way variance can equal zero is if the random variable is constant—it always takes the same value. Conversely, if a random variable has variance zero, it must be almost surely constant. Units of Measurement If your data is measured in meters, variance is in square meters. If your data is in dollars, variance is in dollars squared. This is why standard deviation (the square root) is often more interpretable than variance in practice. How Variance Transforms Understanding how variance changes when you transform your data is essential: Adding a constant does nothing: $$\operatorname{Var}(X + c) = \operatorname{Var}(X)$$ Shifting all values by a constant amount doesn't change how spread out they are. Multiplying by a constant scales variance by the square of that constant: $$\operatorname{Var}(aX) = a^2\operatorname{Var}(X)$$ If you double all your values, variance increases by a factor of 4 (because $2^2 = 4$). This makes sense: spreading everything out twice as far increases dispersion by a factor of 4. The Law of Total Variance This is a powerful decomposition formula. For any two random variables $X$ and $Y$: $$\operatorname{Var}(X) = \operatorname{E}\big[\operatorname{Var}(X \mid Y)\big] + \operatorname{Var}\big(\operatorname{E}[X \mid Y]\big)$$ What this means: The total variance in $X$ can be split into two parts: The expected value of the variance of $X$ given $Y$ (the variance that remains even after conditioning on $Y$) The variance of the conditional expectation (how much $X$ varies on average as $Y$ changes) This is useful when you can explain part of the variance of $X$ by conditioning on another variable $Y$. The Bienaymé Formula: Adding Variances If $X$ and $Y$ are uncorrelated (or independent), then: $$\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$$ This is intuitively appealing: if two variables are unrelated, their variances add when you combine them. However, if $X$ and $Y$ are correlated, you must include a covariance term: $\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X, Y)$. Computing Variance for Different Types of Variables Discrete Random Variables For a discrete random variable with probability mass function $p(x)$: $$\operatorname{Var}(X) = \sumx (x - \mu)^2 p(x)$$ You calculate the squared deviation for each possible value, weight it by its probability, and sum across all values. Continuous Random Variables For a continuous random variable with probability density function $f(x)$: $$\operatorname{Var}(X) = \int{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx$$ This is the continuous analog: instead of summing over discrete values, you integrate over the entire range. Common Distributions Memorizing the variance formulas for common distributions is helpful: Bernoulli(p): $\operatorname{Var}(X) = p(1 - p)$ Binomial(n, p): $\operatorname{Var}(X) = np(1 - p)$ Poisson($\lambda$): $\operatorname{Var}(X) = \lambda$ Exponential($\lambda$): $\operatorname{Var}(X) = \frac{1}{\lambda^2}$ Notice that for a Poisson distribution, the mean and variance are equal (both $\lambda$). This is a distinctive feature of the Poisson distribution. <extrainfo> Additional Concepts Variance as a Central Moment: Variance is formally the second central moment of a probability distribution. The first central moment is always zero (by definition), and the second central moment is variance. Higher central moments describe other properties like skewness and kurtosis. Finiteness Issues: Not all distributions have finite variance. Some have infinite means (like the Cauchy distribution), which means their variance is also infinite. Others, like the Pareto distribution with shape parameter $\alpha \le 2$, have finite means but infinite variances. This is an important edge case to be aware of when working with theoretical distributions. Variance Notation: Variance is technically the covariance of a random variable with itself: $\operatorname{Var}(X) = \operatorname{Cov}(X, X)$. This connection to covariance helps explain why variance shares many properties with covariance. </extrainfo>
Flashcards
What is the conceptual definition of variance in terms of expected value?
The expected value of the squared deviation of a random variable from its mean.
What does variance measure regarding a set of numbers?
The dispersion of the numbers around their average value.
What is the formal equation for the variance of a random variable $X$ with mean $\mu$?
$\operatorname{Var}(X)=\operatorname{E}\big[(X-\mu)^{2}\big]$ (where $\operatorname{E}$ is the expected value).
What is the alternative computational formula for variance involving $\operatorname{E}[X^{2}]$ and $(\operatorname{E}[X])^{2}$?
$\operatorname{Var}(X)=\operatorname{E}[X^{2}] - (\operatorname{E}[X])^{2}$.
How is variance defined in terms of covariance?
It is the covariance of a random variable with itself.
Which specific central moment of a probability distribution corresponds to the variance?
The second central moment.
Why is the variance of a random variable always non‑negative?
Because it is a sum of squares.
What is the variance of a constant random variable?
Zero.
How do the units of measurement for variance relate to the units of the original variable?
They are the square of the original units (e.g., meters squared).
What happens to the variance if a distribution lacks a finite expected value?
The variance is also infinite.
What is the variance of a random variable $X$ after adding a constant $c$?
$\operatorname{Var}(X+c)=\operatorname{Var}(X)$ (it remains unchanged).
What is the effect of scaling a random variable $X$ by a constant $a$ on its variance?
The variance is scaled by $a^{2}$: $\operatorname{Var}(aX)=a^{2}\operatorname{Var}(X)$.
What is the variance formula for a discrete random variable with probability mass function $p(x)$?
$\operatorname{Var}(X)=\sum{x} (x-\mu)^{2}p(x)$.
What is the variance formula for a continuous random variable with probability density function $f(x)$?
$\operatorname{Var}(X)=\int{-\infty}^{\infty}(x-\mu)^{2}f(x)\,dx$.
How is standard deviation mathematically related to variance?
It is the positive square root of the variance.
Under what condition is population variance calculated?
When all possible observations of a random variable are available.
What is the formula for population variance $\sigma^{2}$ for a finite population of size $N$?
$\sigma^{2}= \frac{1}{N}\sum{i=1}^{N}(x{i}-\mu)^{2}$ (where $\mu$ is the population mean).
What is the purpose of computing sample variance?
To estimate the population variance from a subset of observations.
What is the formula for the unbiased sample variance $s^2$ using Bessel’s correction?
$s^{2}= \frac{1}{n-1}\sum{i=1}^{n}(x{i}-\bar{x})^{2}$ (where $n$ is sample size and $\bar{x}$ is sample mean).
By what factor is the sample variance estimator biased if $n$ is used as the divisor instead of $n-1$?
It is biased downward by a factor of $(n-1)/n$.
What is the decomposition formula for $\operatorname{Var}(X)$ given random variables $X$ and $Y$?
$\operatorname{Var}(X)=\operatorname{E}\big[\operatorname{Var}(X\mid Y)\big] + \operatorname{Var}\big(\operatorname{E}[X\mid Y]\big)$.
What is the relationship between the variance of the sum $\operatorname{Var}(X+Y)$ and individual variances for uncorrelated variables?
$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)$.
What are the variances for the Bernoulli, Binomial, and Poisson distributions?
Bernoulli(p): $p(1-p)$ Binomial(n, p): $np(1-p)$ Poisson($\lambda$): $\lambda$
What is the mean and variance of an exponential distribution with rate parameter $\lambda$?
Mean is $1/\lambda$; Variance is $1/\lambda^{2}$.

Quiz

What is the formula for the variance of a random variable \(X\) in terms of its mean \(\mu\)?
1 of 10
Key Concepts
Variance Concepts
Variance
Standard deviation
Population variance
Sample variance
Unbiased estimator (Bessel’s correction)
Law of total variance
Bienaymé formula (Additivity of variance)
Scaling property of variance
Units of variance
Infinite variance
Central moment
Covariance