Model Selection & Multicollinearity
Adding predictors always improves in-sample fit. That is exactly why model selection exists. This page is about choosing the right model complexity and diagnosing whether correlated predictors are making the model unstable.
Why model selection matters
The overfitting trap
R² always increases or stays flat when you add predictors. That makes it a dangerous selection metric.
A model can look excellent in-sample simply because it is absorbing noise. In that case, complexity is not learning signal — it is learning accidents.
The multicollinearity trap
When predictors are correlated with each other, coefficient estimates become unstable. Small data changes can flip signs, inflate standard errors, and destroy interpretability.
The model may still look decent in-sample, but its internal logic becomes fragile.
A useful order for model selection thinking
Start with fit, but do not stop there
R² is descriptive, not enough for selection. It tells you how much variance is explained, not whether extra variables are justified.
Then apply a complexity penalty
Adjusted R², AIC, and BIC all ask whether the fit improvement is worth the extra parameters.
Then inspect predictor structure
A good criterion value can still hide coefficient instability if the model is collinear inside.
Then connect back to governance
In regulated models, simpler, interpretable, stable structures often beat marginally better but fragile alternatives.
R² vs Adjusted R² — watch overfitting happen
The true data-generating process contains only a few real predictors. As you add noise variables, R² keeps climbing, but the penalised metrics start resisting.
Comparing candidate models
Here the question is not “what is the best possible fit?” It is “which candidate model gives the best trade-off between explanatory power and complexity?”
Model comparison table
Visual comparison
VIF — how much variance is being inflated?
VIF asks how much a predictor is explainable by the other predictors. The more redundant it is, the less stable its own coefficient becomes.
Model selection criteria compared
| Criterion | Penalty | Favours | Typical use |
|---|---|---|---|
| R² | none | More complex models | Descriptive only, not enough for selection |
| Adjusted R² | mild complexity penalty | Parsimonious fit | Quick nested OLS comparisons |
| AIC | +2 per parameter | Predictive adequacy | General model comparison, good default |
| BIC | +ln(n) per parameter | Simpler structures | Governance-heavy environments, large samples |
| AICc | small-sample correction | Safer in small n | When n is not large relative to p |
| Cross-validation | direct out-of-sample error | Best generalisation | Predictive model benchmarking |
| VIF | not a fit criterion | Stable coefficients | Collinearity diagnosis |
Concepts every validator should keep
The real trade-off
Simple models miss some structure but are stable. Complex models fit more but can become brittle. Selection criteria are trying to balance exactly that.
Simpler often wins
Especially in regulated modelling, an extra variable must earn its place through clear incremental value.
Prediction can survive while interpretation dies
A collinear model may still predict acceptably in-sample, but coefficient-level reasoning becomes unreliable.
Bootstrap the coefficients
If coefficients swing across resamples, the model is fragile even if one static fit looked fine.
Different philosophies
AIC leans toward predictive usefulness. BIC leans toward structural parsimony and is harsher as n grows.
Selection is also a documentation problem
The chosen model should be defendable not only statistically, but also operationally and conceptually.
What to leave this page with
Raw fit is easy to improve. Stable, justifiable fit is much harder.
The useful order is: first compare fit, then penalise complexity, then inspect collinearity, then ask whether the selected structure is stable enough to defend.
Once that mindset is clear, model selection stops being a metric contest and becomes a judgement problem.