← ds learning track
notes · 06

Model Selection & Multicollinearity

Adding predictors always improves in-sample fit. That is exactly why model selection exists. This page is about choosing the right model complexity and diagnosing whether correlated predictors are making the model unstable.

Start with the overfitting problem, then compare R², Adjusted R², AIC, and BIC, then move into multicollinearity and VIF. The goal is to understand why “better fit” can still mean “worse model.”

Why model selection matters

The overfitting trap

R² always increases or stays flat when you add predictors. That makes it a dangerous selection metric.

A model can look excellent in-sample simply because it is absorbing noise. In that case, complexity is not learning signal — it is learning accidents.

Good model = strongest generalisation, not highest raw R².
Validation implication: if a challenger model gains only a tiny fit improvement but doubles the number of variables, the improvement may be statistical decoration rather than real informational gain.

The multicollinearity trap

When predictors are correlated with each other, coefficient estimates become unstable. Small data changes can flip signs, inflate standard errors, and destroy interpretability.

The model may still look decent in-sample, but its internal logic becomes fragile.

A stable predictor should not change personality when a neighbouring variable enters or leaves the model.
Validation red flag: if DTI, utilisation, and income all enter together and coefficient signs jump across samples, the model may be telling you more about redundancy than about risk.

A useful order for model selection thinking

01

Start with fit, but do not stop there

R² is descriptive, not enough for selection. It tells you how much variance is explained, not whether extra variables are justified.

02

Then apply a complexity penalty

Adjusted R², AIC, and BIC all ask whether the fit improvement is worth the extra parameters.

03

Then inspect predictor structure

A good criterion value can still hide coefficient instability if the model is collinear inside.

04

Then connect back to governance

In regulated models, simpler, interpretable, stable structures often beat marginally better but fragile alternatives.

R² vs Adjusted R² — watch overfitting happen

The true data-generating process contains only a few real predictors. As you add noise variables, R² keeps climbing, but the penalised metrics start resisting.

Metrics vs number of predictors

Adjusted R² AIC (scaled) BIC (scaled)

Controls

n (observations)100
True signal predictors3
Max total predictors15
Noise level1.0
Best by Adj. R²
predictors
Best by AIC
predictors
Best by BIC
predictors
True signal
predictors
R² = 1 − SSres / SStot
Adj. R² = 1 − (1−R²)(n−1)/(n−p−1)
AIC = n·ln(RSS/n) + 2p
BIC = n·ln(RSS/n) + p·ln(n)
Reading rule: raw R² is optimistic by design. AIC and Adjusted R² are softer penalties; BIC is harsher and usually prefers simpler models.

Comparing candidate models

Here the question is not “what is the best possible fit?” It is “which candidate model gives the best trade-off between explanatory power and complexity?”

Model comparison table

How to read: ΔAIC or ΔBIC close to zero means a model is competitive. Large positive deltas mean the model is increasingly hard to justify.

Visual comparison

Validator view: if a more complex model improves R² marginally but loses on BIC and adds governance burden, the simpler model is often the stronger candidate.

VIF — how much variance is being inflated?

VIF asks how much a predictor is explainable by the other predictors. The more redundant it is, the less stable its own coefficient becomes.

Correlation heatmap

VIF per variable

Scenario

Max VIF
Mean VIF
Max |ρ|
Diagnosis
VIFj = 1 / (1 − R²j)
Rule of thumb: VIF < 5 usually acceptable, 5–10 concern, above 10 severe.
What it means: VIF = 9 means the variance of that coefficient is about nine times what it would be without collinearity.
Typical remedies: drop one of the correlated variables, combine variables, regularise with ridge / elastic net, or redesign the feature set so that the model is not carrying duplicates.

Model selection criteria compared

Criterion Penalty Favours Typical use
noneMore complex modelsDescriptive only, not enough for selection
Adjusted R²mild complexity penaltyParsimonious fitQuick nested OLS comparisons
AIC+2 per parameterPredictive adequacyGeneral model comparison, good default
BIC+ln(n) per parameterSimpler structuresGovernance-heavy environments, large samples
AICcsmall-sample correctionSafer in small nWhen n is not large relative to p
Cross-validationdirect out-of-sample errorBest generalisationPredictive model benchmarking
VIFnot a fit criterionStable coefficientsCollinearity diagnosis

Concepts every validator should keep

bias–variance

The real trade-off

Simple models miss some structure but are stable. Complex models fit more but can become brittle. Selection criteria are trying to balance exactly that.

parsimony

Simpler often wins

Especially in regulated modelling, an extra variable must earn its place through clear incremental value.

multicollinearity

Prediction can survive while interpretation dies

A collinear model may still predict acceptably in-sample, but coefficient-level reasoning becomes unreliable.

stability

Bootstrap the coefficients

If coefficients swing across resamples, the model is fragile even if one static fit looked fine.

AIC vs BIC

Different philosophies

AIC leans toward predictive usefulness. BIC leans toward structural parsimony and is harsher as n grows.

governance

Selection is also a documentation problem

The chosen model should be defendable not only statistically, but also operationally and conceptually.

What to leave this page with

Raw fit is easy to improve. Stable, justifiable fit is much harder.

The useful order is: first compare fit, then penalise complexity, then inspect collinearity, then ask whether the selected structure is stable enough to defend.

Once that mindset is clear, model selection stops being a metric contest and becomes a judgement problem.