notes · 06

Model Selection & Multicollinearity

Adding predictors always improves in-sample fit. That is exactly why model selection exists. This page is about choosing the right model complexity and diagnosing whether correlated predictors are making the model unstable.

Start with the overfitting problem, then compare R², Adjusted R², AIC, and BIC, then move into multicollinearity and VIF. The goal is to understand why “better fit” can still mean “worse model.”

The Problem Overfitting Demo Model Comparison VIF Explorer Reference Summary

the core problem

Why model selection matters

The overfitting trap

R² always increases or stays flat when you add predictors. That makes it a dangerous selection metric.

A model can look excellent in-sample simply because it is absorbing noise. In that case, complexity is not learning signal — it is learning accidents.

Good model = strongest generalisation, not highest raw R².

Validation implication: if a challenger model gains only a tiny fit improvement but doubles the number of variables, the improvement may be statistical decoration rather than real informational gain.

The multicollinearity trap

When predictors are correlated with each other, coefficient estimates become unstable. Small data changes can flip signs, inflate standard errors, and destroy interpretability.

The model may still look decent in-sample, but its internal logic becomes fragile.

A stable predictor should not change personality when a neighbouring variable enters or leaves the model.

Validation red flag: if DTI, utilisation, and income all enter together and coefficient signs jump across samples, the model may be telling you more about redundancy than about risk.

learning sequence

A useful order for model selection thinking

Start with fit, but do not stop there

R² is descriptive, not enough for selection. It tells you how much variance is explained, not whether extra variables are justified.

Then apply a complexity penalty

Adjusted R², AIC, and BIC all ask whether the fit improvement is worth the extra parameters.

Then inspect predictor structure

A good criterion value can still hide coefficient instability if the model is collinear inside.

Then connect back to governance

In regulated models, simpler, interpretable, stable structures often beat marginally better but fragile alternatives.

interactive · overfitting

R² vs Adjusted R² — watch overfitting happen

The true data-generating process contains only a few real predictors. As you add noise variables, R² keeps climbing, but the penalised metrics start resisting.

Metrics vs number of predictors

R² Adjusted R² AIC (scaled) BIC (scaled)

Controls

n (observations)100

True signal predictors3

Max total predictors15

Noise level1.0

Best by Adj. R²

—

predictors

Best by AIC

—

predictors

Best by BIC

—

predictors

True signal

—

predictors

R² = 1 − SSres / SStot
Adj. R² = 1 − (1−R²)(n−1)/(n−p−1)
AIC = n·ln(RSS/n) + 2p
BIC = n·ln(RSS/n) + p·ln(n)

Reading rule: raw R² is optimistic by design. AIC and Adjusted R² are softer penalties; BIC is harsher and usually prefers simpler models.

interactive · model comparison

Comparing candidate models

Here the question is not “what is the best possible fit?” It is “which candidate model gives the best trade-off between explanatory power and complexity?”

Model comparison table

How to read: ΔAIC or ΔBIC close to zero means a model is competitive. Large positive deltas mean the model is increasingly hard to justify.

Visual comparison

Validator view: if a more complex model improves R² marginally but loses on BIC and adds governance burden, the simpler model is often the stronger candidate.

interactive · multicollinearity

VIF — how much variance is being inflated?

VIF asks how much a predictor is explainable by the other predictors. The more redundant it is, the less stable its own coefficient becomes.

Correlation heatmap

VIF per variable

Scenario

Max VIF

—

Mean VIF

—

Max |ρ|

—

Diagnosis

—

VIFj = 1 / (1 − R²j)

Rule of thumb: VIF < 5 usually acceptable, 5–10 concern, above 10 severe.

What it means: VIF = 9 means the variance of that coefficient is about nine times what it would be without collinearity.

Typical remedies: drop one of the correlated variables, combine variables, regularise with ridge / elastic net, or redesign the feature set so that the model is not carrying duplicates.

reference

Model selection criteria compared

Criterion	Penalty	Favours	Typical use
R²	none	More complex models	Descriptive only, not enough for selection
Adjusted R²	mild complexity penalty	Parsimonious fit	Quick nested OLS comparisons
AIC	+2 per parameter	Predictive adequacy	General model comparison, good default
BIC	+ln(n) per parameter	Simpler structures	Governance-heavy environments, large samples
AICc	small-sample correction	Safer in small n	When n is not large relative to p
Cross-validation	direct out-of-sample error	Best generalisation	Predictive model benchmarking
VIF	not a fit criterion	Stable coefficients	Collinearity diagnosis

deeper concepts

Concepts every validator should keep

bias–variance

The real trade-off

Simple models miss some structure but are stable. Complex models fit more but can become brittle. Selection criteria are trying to balance exactly that.

parsimony

Simpler often wins

Especially in regulated modelling, an extra variable must earn its place through clear incremental value.

multicollinearity

Prediction can survive while interpretation dies

A collinear model may still predict acceptably in-sample, but coefficient-level reasoning becomes unreliable.

stability

Bootstrap the coefficients

If coefficients swing across resamples, the model is fragile even if one static fit looked fine.

AIC vs BIC

Different philosophies

AIC leans toward predictive usefulness. BIC leans toward structural parsimony and is harsher as n grows.

governance

Selection is also a documentation problem

The chosen model should be defendable not only statistically, but also operationally and conceptually.

summary

What to leave this page with

Raw fit is easy to improve. Stable, justifiable fit is much harder.

The useful order is: first compare fit, then penalise complexity, then inspect collinearity, then ask whether the selected structure is stable enough to defend.

Once that mindset is clear, model selection stops being a metric contest and becomes a judgement problem.