Variance, Covariance & Variability
This page connects three layers that often get learned separately: spread in one variable, co-movement across two variables, and the error metrics used to judge model performance. In practice, they belong to the same language of uncertainty.
How these measures connect
The cleanest way to understand this page is to split the world into three questions: how much one variable varies, how two variables move together, and how wrong a model is when predictions meet reality.
Single variable: spread
Variance is the average squared deviation from the mean. It is the mathematical core.
Standard deviation is the square root of variance, so it goes back to the original unit.
Standard error is not data spread, but uncertainty of the sample mean.
Coefficient of variation scales spread relative to the mean.
Two variables and model error
Covariance asks whether two variables move together. Correlation standardises that relationship.
Later, when predictions are compared with actual outcomes, the same logic turns into bias, MAE, MSE, RMSE, R², and MAPE.
Variance and standard deviation
Edit the dataset and watch how squared deviations build variance. This is the best way to see why a few large observations matter so much.
Standard error: uncertainty of the mean
Standard deviation tells you how dispersed the data is. Standard error tells you how uncertain the sample mean is. That distinction matters constantly in validation and inference.
Coefficient of variation: relative spread
Absolute spread is not enough when series live on different scales. CV tells you how large variability is relative to the mean.
Dataset A
Dataset B
Covariance and correlation
Variance is about one variable. Covariance starts when you care about two. Correlation then standardises covariance so the relationship becomes unit-free and comparable.
Model error metrics
When predictions meet actuals, spread becomes error. These are the metrics used to judge whether the model is systematically wrong, noisy, or simply weak.
Validator’s cheat sheet
| Metric | Formula | Unit | Sensitive to outliers? | Validation use |
|---|---|---|---|---|
| Variance | Σ(x−μ)² / N | unit² | Yes | Foundational spread, theoretical core |
| Std Dev | √Variance | unit | Yes | Most interpretable spread metric |
| SE | σ / √n | unit | Indirectly | Confidence intervals, backtests, inference |
| CV | σ / μ × 100 | % | Yes | Relative spread across different scales |
| Covariance | Σ(x−μx)(y−μy) / N | unitx·unity | Yes | Co-movement and dependence structure |
| Correlation | Cov / (σx·σy) | unit-free | Yes | Standardised dependence comparison |
| Bias | Σ(pred−actual) / N | unit | Moderate | Calibration check |
| MAE | Σ|pred−actual| / N | unit | Moderate | Robust average miss size |
| MSE / RMSE | Σ(pred−actual)² / N | unit² / unit | Very | Large error penalty |
| R² | 1 − SSres / SStot | unit-free | Yes | Explained variance |
| MAPE | Σ|err/actual| / N × 100 | % | Moderate | Scale-free comparison |
Concepts every validator should keep
The core tradeoff
MSE is not just error; it decomposes into systematic error and instability. That distinction is central in validation thinking.
Why parameters cost information
Every estimated parameter reduces effective flexibility. That is why N−1 appears and why small samples inflate uncertainty.
Association is not mechanism
Covariance and correlation describe co-movement. They do not tell you why two variables move together.
Variance can depend on level
Some datasets become noisier as values grow. This matters because constant-variance assumptions quietly fail.
Correlation across predictors
When predictors co-move too strongly, coefficient estimates become unstable even if model fit looks acceptable.
Why the mean stabilises
As sample size grows, standard error shrinks and the sample mean becomes a more precise estimate of the population mean.
What to leave this page with
Variance is the language of spread, covariance is the language of joint movement, and model error metrics are what those ideas become once predictions meet outcomes.
The useful order is: first understand spread in one variable, then uncertainty of the mean, then relative spread, then co-movement, then model error.
Once these are connected, validation metrics stop looking like a bag of formulas and start behaving like one system of uncertainty measurement.