← ds learning track
notes · 05

WoE, IV & Data Preparation

Before a scorecard can learn, the data has to speak the right language. WoE transforms raw predictors into risk-ordered values, IV tells you which variables are worth keeping, and data preparation determines whether the model is learning signal or noise.

Start with the scorecard pipeline, then move into WoE and IV, then outlier handling, then missing data treatment. The goal is to understand why preparation is not a side task — it is the model’s foundation.

How scorecard data gets model-ready

1. Clean

Handle missing values, detect outliers, fix data types, remove duplicates, and make sure the observation window is consistent.

This is where a large part of modelling time actually goes. Raw data almost never arrives in a scorecard-ready state.

2. Transform

Bin continuous variables into interpretable groups, create a dedicated missing bin where relevant, and convert bins into WoE.

This turns raw predictors into a monotonic, logistic-friendly scale that is easier to govern.

3. Select

Rank predictors by IV, remove weak variables, check monotonicity, and test collinearity among survivors.

The final set should not only be predictive, but also stable and defendable.

Scorecard reality: in PD development, binning and WoE are not cosmetic. They are part of how the model becomes interpretable, monotonic, and operationally governable.

A useful order for learning data preparation

01

First ask whether the raw variable is usable

Missingness, spikes, outliers, and unit problems matter before predictive strength does.

02

Then ask whether the variable is monotonic in risk

WoE binning should ideally create a risk-ordered pattern, not random jumps between bins.

03

Then ask whether the variable is informative

IV provides a compact summary of how much separation the variable contributes across bins.

04

Only then move to model entry

A variable with high apparent signal but poor stability, leakage risk, or non-interpretable binning can still be a bad candidate.

WoE & IV — the scorecard builder’s core tool

Choose a variable to see its bins, event rates, WoE values, and total IV. This is the fastest way to build intuition for strong, weak, and suspicious predictors.

WoE by bin

Positive WoE (lower risk) Negative WoE (higher risk)

Event rate by bin

Variable

WoE breakdown

Total IV
Predictive power
Bins
Monotonic WoE?
WoEi = ln(%NonEventsi / %Eventsi)
IV = Σ (%NonEventsi − %Eventsi) × WoEi
IV thresholds: < 0.02 useless · 0.02–0.10 weak · 0.10–0.30 medium · 0.30–0.50 strong · > 0.50 suspicious.
Monotonicity: WoE should ideally move consistently with risk. If it jumps up and down, either binning needs repair or the variable has a more complicated relationship than a basic scorecard likes. [oai_citation:0‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)

Outlier detection — three methods compared

Outliers can distort means, variances, and coefficients. But in risk modelling, extreme values are not automatically bad data. The right question is not only “is this extreme?” but also “is it economically real and model-relevant?”

Data with outlier flags

Regular values IQR method z-score modified z-score

Method comparison

Scenario

Extreme multiplier1.0
IQR flags
z-score flags
Modified z flags
Mean shift vs median
Risk caution: a very high LTV or very low income can be an outlier statistically and still be real economically. Flag first, then decide whether to cap, transform, bin separately, or keep as genuine signal. [oai_citation:1‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)

Missing data — not all gaps are equal

Missingness can be random, partially explainable, or directly informative. In scorecards, “missing” is often not something to hide — it can be a risk signal of its own.

Missingness mechanisms

MCAR

Missing Completely At Random. The gap is unrelated to observed or unobserved values.

MAR

Missing At Random. Missingness depends on other observed variables.

MNAR

Missing Not At Random. Missingness itself carries information about the hidden value or risk state.

Treatment methods

Method MCAR MAR MNAR Use case
Listwise deletionYesNoNoTiny missingness, MCAR confirmed
Mean / MedianYes / ApproxNoNoFast baseline, median for skewed
ModeN/ANoNoCategorical variables
Regression imputationYesPartiallyYesOther variables predict the missing one
Multiple imputationYesYesYesBest general statistical treatment
Missing indicatorN/AN/APartialMissingness itself is informative
Separate WoE binN/AN/AYesScorecard best practice
Scorecard best practice: create a dedicated WoE bin for missing values when missingness itself may carry risk. Missing bureau score often has a different default profile than any observed score bucket. [oai_citation:2‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)
Common mistake: filling everything with mean or median can destroy real signal if “not available” is itself behaviourally informative.

IV thresholds & binning rules

IV Range Predictive power Action Typical variables
< 0.02UselessDropNoise, irrelevant fields
0.02 – 0.10WeakKeep only if business-criticalDependents, minor demographics
0.10 – 0.30MediumCore candidateAge, tenure, income, LTV
0.30 – 0.50StrongHigh-value predictorBureau score, utilisation, delinquency
> 0.50SuspiciousInvestigate leakage / overfitPost-event variables, contaminated fields
Binning rules of thumb: enough observations per bin, economically sensible split points, monotonic WoE if possible, separate missing bin if informative, and no bins whose event rate is driven by tiny counts.

Concepts every validator should keep

monotonicity

Good binning is ordered thinking

WoE bins should ideally form a risk story that moves in one direction. Random zig-zag binning is usually a sign the variable has not been prepared well enough.

leakage

Suspiciously strong can be bad

An IV above 0.5 can reflect genuine strength, but it can also mean leakage, post-event contamination, or an operational proxy for the target.

missingness

Missing can be predictive

Especially in credit data, “not available” may describe a borrower segment with its own behavioural risk profile.

outliers

Flag first, decide second

Do not automatically winsorise or delete extremes. First decide whether the extreme value is data error, rare truth, or a business-relevant tail segment.

stability

Preparation must survive time

A variable that looks great in development but drifts badly later will damage model governance. PSI and CSI thinking starts here, not after deployment. [oai_citation:3‡06-parametric-tests.html](sediment://file_00000000c174720a873c72d512b98176)

sample bias

Reject inference still exists

What you prepare is usually only accepted-book data. That means the data pipeline itself can already contain structural selection bias before modelling begins.

What to leave this page with

Data preparation is not a pre-model chore. It is where the eventual model gets most of its structure, stability, and interpretability.

The useful order is: first clean the variable, then bin it sensibly, then convert to WoE, then judge it with IV, then ask whether the resulting feature is stable enough to enter the scorecard.

Once that mindset is clear, WoE and IV stop looking like legacy scorecard rituals and start looking like a disciplined preparation system.