← ds learning track
notes · 10

Variance, Covariance & Variability

This page connects three layers that often get learned separately: spread in one variable, co-movement across two variables, and the error metrics used to judge model performance. In practice, they belong to the same language of uncertainty.

Start with variance and standard deviation, then move to standard error, then relative spread, then covariance and correlation, and finally model error metrics. This is the path from descriptive spread to validation logic.

How these measures connect

The cleanest way to understand this page is to split the world into three questions: how much one variable varies, how two variables move together, and how wrong a model is when predictions meet reality.

Single variable: spread

Variance is the average squared deviation from the mean. It is the mathematical core.

Standard deviation is the square root of variance, so it goes back to the original unit.

Standard error is not data spread, but uncertainty of the sample mean.

Coefficient of variation scales spread relative to the mean.

Variance → Std Dev → SE → confidence intervals. These are connected, not isolated formulas.

Two variables and model error

Covariance asks whether two variables move together. Correlation standardises that relationship.

Later, when predictions are compared with actual outcomes, the same logic turns into bias, MAE, MSE, RMSE, R², and MAPE.

Validation lens: asset correlation drives portfolio loss variance, while model error metrics drive judgement on calibration, stability, and predictive usefulness.

Variance and standard deviation

Edit the dataset and watch how squared deviations build variance. This is the best way to see why a few large observations matter so much.

Standard error: uncertainty of the mean

Standard deviation tells you how dispersed the data is. Standard error tells you how uncertain the sample mean is. That distinction matters constantly in validation and inference.

Coefficient of variation: relative spread

Absolute spread is not enough when series live on different scales. CV tells you how large variability is relative to the mean.

Dataset A

Mean A100
Std Dev A15
CV (A)
Interpretation

Dataset B

Mean B1000
Std Dev B100
CV (B)
Interpretation
Verdict:
Validation context: CV is useful when comparing volatility or model error across pools, segments, or portfolios with very different mean levels.

Covariance and correlation

Variance is about one variable. Covariance starts when you care about two. Correlation then standardises covariance so the relationship becomes unit-free and comparable.

Model error metrics

When predictions meet actuals, spread becomes error. These are the metrics used to judge whether the model is systematically wrong, noisy, or simply weak.

Validator’s cheat sheet

Metric Formula Unit Sensitive to outliers? Validation use
VarianceΣ(x−μ)² / Nunit²YesFoundational spread, theoretical core
Std Dev√VarianceunitYesMost interpretable spread metric
SEσ / √nunitIndirectlyConfidence intervals, backtests, inference
CVσ / μ × 100%YesRelative spread across different scales
CovarianceΣ(x−μx)(y−μy) / Nunitx·unityYesCo-movement and dependence structure
CorrelationCov / (σx·σy)unit-freeYesStandardised dependence comparison
BiasΣ(pred−actual) / NunitModerateCalibration check
MAEΣ|pred−actual| / NunitModerateRobust average miss size
MSE / RMSEΣ(pred−actual)² / Nunit² / unitVeryLarge error penalty
1 − SSres / SStotunit-freeYesExplained variance
MAPEΣ|err/actual| / N × 100%ModerateScale-free comparison

Concepts every validator should keep

bias–variance

The core tradeoff

MSE is not just error; it decomposes into systematic error and instability. That distinction is central in validation thinking.

degrees of freedom

Why parameters cost information

Every estimated parameter reduces effective flexibility. That is why N−1 appears and why small samples inflate uncertainty.

correlation ≠ causation

Association is not mechanism

Covariance and correlation describe co-movement. They do not tell you why two variables move together.

heteroscedasticity

Variance can depend on level

Some datasets become noisier as values grow. This matters because constant-variance assumptions quietly fail.

multicollinearity

Correlation across predictors

When predictors co-move too strongly, coefficient estimates become unstable even if model fit looks acceptable.

law of large numbers

Why the mean stabilises

As sample size grows, standard error shrinks and the sample mean becomes a more precise estimate of the population mean.

What to leave this page with

Variance is the language of spread, covariance is the language of joint movement, and model error metrics are what those ideas become once predictions meet outcomes.

The useful order is: first understand spread in one variable, then uncertainty of the mean, then relative spread, then co-movement, then model error.

Once these are connected, validation metrics stop looking like a bag of formulas and start behaving like one system of uncertainty measurement.