Regularization & Shrinkage
Classical regression tries to fit the data as well as possible. Regularization adds a second goal: keep the model controlled. That extra constraint reduces coefficient instability, softens overfitting, and makes the model less fragile under noisy or collinear predictors.
Why unconstrained regression becomes unstable
Pure fit logic
Ordinary least squares only asks one question: which coefficients minimise residual error on the observed sample?
When predictors are noisy, correlated, or numerous, that freedom can produce large and unstable coefficients.
Regularized logic
Regularization adds a penalty for complexity. The model is no longer rewarded only for fitting the sample, but also for staying small, stable, and better behaved.
That is why some bias is introduced on purpose: to reduce variance and improve generalization.
A useful order for learning shrinkage
Start with coefficient instability
Before penalties make sense, first understand why free coefficients become noisy under weak data structure.
Then compare the penalty shapes
Ridge shrinks continuously, lasso can force exact zeros, and elastic net blends both behaviours.
Then watch the coefficient path
The path across lambda explains more intuitively than the closed-form formulas alone.
Then connect to prediction error
The real payoff is not prettier coefficients, but a better bias-variance balance and more stable out-of-sample behaviour.
Ridge, Lasso, and Elastic Net under one slider
Increase lambda and watch coefficients shrink. Compare how ridge keeps all variables alive, lasso drops some to zero, and elastic net behaves in between.
The path tells the story
Instead of one lambda, view the whole shrinkage path. This is often the clearest way to understand which variables are robust and which survive only when the model is allowed to be too flexible.
Ridge / Lasso coefficient path
Method
Why adding bias can lower total error
This is the central logic of shrinkage. Training error rises with stronger penalties, but test error often improves until the model becomes too constrained.
Shrinkage as a response to correlated predictors
When predictors overlap strongly, OLS struggles to allocate weight cleanly. Coefficients become unstable, signs can flip, and tiny sample changes can produce very different models.
Shrinkage methods compared
| Method | Penalty | Main effect | Best use case |
|---|---|---|---|
| OLS | None | Pure fit, no shrinkage | Clean low-noise, low-collinearity settings |
| Ridge | L2 | Shrinks all coefficients continuously | Many correlated predictors, stability focus |
| Lasso | L1 | Shrinks and can set coefficients to zero | Sparse solutions, variable selection |
| Elastic Net | L1 + L2 | Selection + grouped shrinkage | Correlated predictors with sparsity needs |
| Adaptive Lasso | Weighted L1 | More selective variable penalization | When selection quality matters strongly |
Concepts every validator should keep
Bias is not always bad
In shrinkage methods, a little bias is introduced intentionally to reduce variance and improve generalization.
Variable selection can be unstable
Lasso selection is useful, but in weak-signal settings the selected set can change noticeably across samples.
Correlated predictors confuse OLS allocation
When variables overlap heavily, OLS can distribute weight erratically even if the overall fit still looks acceptable.
The whole path matters more than one lambda
A variable that survives across a broad penalty range is often more robust than a variable that appears only at near-zero penalty.
Interpretability can change with shrinkage
Regularized models may be more stable but can also be harder to narrate if coefficients are heavily distorted or grouped effects dominate.
Penalty choice must be validated, not assumed
Lambda and alpha should be justified through validation logic, not selected because a single run happened to look good.
What to leave this page with
Regularization is not a cosmetic add-on to regression. It is a controlled way to trade a little bias for a lot more stability.
The useful order is: first understand why unconstrained coefficients become unstable, then compare ridge, lasso, and elastic net, then inspect the coefficient path, then evaluate how shrinkage changes test error and collinearity behaviour.
Once that structure is clear, shrinkage stops looking like an abstract penalty term and starts looking like a practical tool for building more robust models.