RemNote Community
Community

Variance - Analysis Techniques and Inference

Understand how to compute variance for linear combinations, update variance with new observations, and apply various tests and estimators for comparing and analyzing variances.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the formula for the variance of a linear combination of random variables $\operatorname{Var}(\sum{i}a{i}X{i})$?
1 of 7

Summary

Propagation and Linear Combinations Understanding Variance of Linear Combinations When you combine random variables using fixed coefficients, their variances don't simply add. This is one of the most important principles in statistics because we constantly work with combinations of measurements—averages, sums, differences, and weighted totals all follow these rules. The Complete Formula For random variables $Xi$ with coefficients $ai$, the variance of their linear combination is: $$\operatorname{Var}\Big(\sum{i}a{i}X{i}\Big)=\sum{i}a{i}^{2}\operatorname{Var}(X{i}) + 2\sum{i<j}a{i}a{j}\operatorname{Cov}(X{i},X{j})$$ What does this mean? The total variance has two parts: The variance part ($\sum{i}a{i}^{2}\operatorname{Var}(X{i})$): Each variable's variance gets scaled by the square of its coefficient. This is why larger coefficients have an outsized impact on the final variance. The covariance part ($2\sum{i<j}a{i}a{j}\operatorname{Cov}(X{i},X{j})$): The covariances between pairs of variables matter. If variables are positively correlated, they inflate the total variance; if negatively correlated, they reduce it. Key insight: Squaring the coefficients means that a coefficient of $-2$ contributes the same as $+2$—what matters is the magnitude, not the sign. Special Case: Uncorrelated Variables When variables are uncorrelated, all the covariance terms equal zero, so the formula simplifies beautifully: $$\operatorname{Var}\Big(\sum a{i}X{i}\Big)=\sum a{i}^{2}\operatorname{Var}(X{i})$$ This is a game-changer in practice. Many statistical models assume independence or use orthogonal designs precisely because they make variance calculations tractable. Example: If you're averaging three independent measurements with variances 4, 4, and 4, each getting weight $\frac{1}{3}$: $$\operatorname{Var}\left(\frac{1}{3}X1 + \frac{1}{3}X2 + \frac{1}{3}X3\right) = \left(\frac{1}{3}\right)^2 \cdot 4 + \left(\frac{1}{3}\right)^2 \cdot 4 + \left(\frac{1}{3}\right)^2 \cdot 4 = \frac{4}{9} + \frac{4}{9} + \frac{4}{9} = \frac{4}{3}$$ Notice how averaging reduces variance: $\frac{4}{3}$ is much smaller than 4. Matrix Notation for Linear Combinations When working with many variables and coefficients, writing out individual terms becomes cumbersome. Matrix notation provides a compact, powerful way to express the same ideas. Let: $\mathbf{X}$ = column vector of random variables $[X1, X2, \ldots, Xp]^{\mathsf{T}}$ $\mathbf{a}$ = column vector of coefficients $[a1, a2, \ldots, ap]^{\mathsf{T}}$ $\Sigma$ = the $p \times p$ covariance matrix of $\mathbf{X}$ Then the variance of the linear combination $\mathbf{a}^{\mathsf{T}}\mathbf{X}$ (which equals $a1X1 + a2X2 + \cdots + apXp$) is: $$\operatorname{Var}(\mathbf{a}^{\mathsf{T}}\mathbf{X}) = \mathbf{a}^{\mathsf{T}}\Sigma\mathbf{a}$$ Why is this useful? This single equation encodes the entire variance formula from before, including all the covariance terms. The matrix multiplication automatically handles squaring coefficients and pulling out covariances from $\Sigma$. What is $\Sigma$? The covariance matrix has variances on the diagonal and covariances off the diagonal: $$\Sigma = \begin{bmatrix} \operatorname{Var}(X1) & \operatorname{Cov}(X1, X2) & \cdots \\ \operatorname{Cov}(X2, X1) & \operatorname{Var}(X2) & \cdots \\ \vdots & \vdots & \ddots \end{bmatrix}$$ Since covariance is symmetric, $\Sigma$ is always a symmetric matrix. Decomposition, Updating, and Related Concepts Recursive Update When Adding an Observation In data analysis, you often start with a sample, compute its mean and variance, and then receive new observations. Rather than recomputing from scratch, you can update efficiently using the previous statistics. The Update Formula When a new observation $x{new}$ is added to a dataset with $n-1$ existing observations, current mean $\bar{x}$, and current variance $s^{2}$, the updated variance becomes: $$s{\text{new}}^{2}= \frac{(n-1)s^{2}+ (x{new}-\bar{x})^{2}}{n}$$ Understanding the formula: The numerator has two parts: $(n-1)s^2$ preserves all the variance information from the existing data $(x{new}-\bar{x})^{2}$ measures how far the new point is from the old mean—this deviation contributes to the new variance We divide by $n$ because now we have $n$ total observations Why this matters: This formula is computationally efficient (only one pass through data needed) and numerically stable, which is why it's preferred in practice. Example: You have 10 measurements with mean 100 and variance 25. You get one more measurement with value 110. $$s{\text{new}}^{2} = \frac{10 \cdot 25 + (110-100)^{2}}{11} = \frac{250 + 100}{11} = \frac{350}{11} \approx 31.8$$ The new variance increased because the new point was far from the old mean. Tests of Equality of Variances Sometimes you need to test whether two or more groups have equal variances. This is important because many statistical tests (like standard t-tests and ANOVA) assume equal variances across groups. Parametric Tests (for Normal Data) F-test: Compares variances of two normally distributed samples. It computes the ratio of the larger sample variance to the smaller one. Under the null hypothesis of equal variances, this ratio follows an F-distribution. If the computed ratio is too extreme, you reject the null hypothesis. Chi-square test: Tests whether a single sample's variance equals a hypothesized value. It uses the sample variance and sample size to construct a test statistic that follows a chi-square distribution under normality. <extrainfo> Non-Parametric and Robust Tests These tests are used when you doubt the normality assumption: Bartlett's test: Sensitive to departures from normality, so it's less reliable when normality is questionable Levene's test: More robust to non-normality; uses absolute deviations from the median Brown–Forsythe test: Similar to Levene's but uses absolute deviations from the median with different weighting </extrainfo> <extrainfo> Resampling Approaches Bootstrap and jackknife methods: These can test variance equality without distributional assumptions by repeatedly resampling the data to build empirical distributions of the variance estimates. </extrainfo> <extrainfo> Historical Context on Variance Computation The computation and estimation of sample variance has been refined over decades. Early computational methods could produce inaccurate results due to numerical errors, but modern algorithms use stable approaches. Understanding how variance estimators work—including unbiased versions for standard deviation—provides important background for why we use certain formulas in practice rather than others. </extrainfo>
Flashcards
What is the formula for the variance of a linear combination of random variables $\operatorname{Var}(\sum{i}a{i}X{i})$?
$\sum{i}a{i}^{2}\operatorname{Var}(X{i}) + 2\sum{i<j}a{i}a{j}\operatorname{Cov}(X{i},X{j})$
What is the matrix notation formula for the variance of a linear combination $\operatorname{Var}(\mathbf{a}^{\mathsf{T}}\mathbf{X})$?
$\mathbf{a}^{\mathsf{T}}\Sigma\mathbf{a}$ (where $\mathbf{a}$ is a vector of coefficients and $\Sigma$ is the covariance matrix)
How does the formula for the variance of a weighted sum simplify if the random variables are uncorrelated?
$\operatorname{Var}(\sum a{i}X{i})=\sum a{i}^{2}\operatorname{Var}(X{i})$
What is the formula for the updated variance $s{\text{new}}^{2}$ when adding a new observation $x{new}$ to a dataset with $n$ points?
$s{\text{new}}^{2}= \frac{(n-1)s^{2}+ (x{new}-\bar{x})^{2}}{n}$
Which test is used to compare the variances of two normally distributed samples?
The F-test
Which test assesses a sample's variance against a hypothesized value for a normal population?
The chi-square test
What are three common alternatives to parametric variance tests when normality is questionable?
Bartlett’s test Levene’s test Brown–Forsythe test

Quiz

If a new observation \(x_{\text{new}}\) is added to a data set of size \(n\) with current mean \(\bar{x}\) and variance \(s^{2}\), which formula gives the updated variance \(s_{\text{new}}^{2}\)?
1 of 7
Key Concepts
Variance and Covariance
Variance of a Linear Combination
Covariance Matrix
Weighted Sum of Uncorrelated Variables
Recursive Variance Update
Variance Testing Methods
F‑test
Bartlett’s test
Levene’s test
Brown–Forsythe test
Resampling Techniques
Bootstrap (statistics)
Jackknife (statistics)