notes · 05

WoE, IV & Data Preparation

Before a scorecard can learn, the data has to speak the right language. WoE transforms raw predictors into risk-ordered values, IV tells you which variables are worth keeping, and data preparation determines whether the model is learning signal or noise.

Start with the scorecard pipeline, then move into WoE and IV, then outlier handling, then missing data treatment. The goal is to understand why preparation is not a side task — it is the model’s foundation.

Pipeline WoE & IV Outliers Missing Data Reference Summary

the scorecard pipeline

How scorecard data gets model-ready

1. Clean

Handle missing values, detect outliers, fix data types, remove duplicates, and make sure the observation window is consistent.

This is where a large part of modelling time actually goes. Raw data almost never arrives in a scorecard-ready state.

2. Transform

Bin continuous variables into interpretable groups, create a dedicated missing bin where relevant, and convert bins into WoE.

This turns raw predictors into a monotonic, logistic-friendly scale that is easier to govern.

3. Select

Rank predictors by IV, remove weak variables, check monotonicity, and test collinearity among survivors.

The final set should not only be predictive, but also stable and defendable.

Scorecard reality: in PD development, binning and WoE are not cosmetic. They are part of how the model becomes interpretable, monotonic, and operationally governable.

learning sequence

A useful order for learning data preparation

First ask whether the raw variable is usable

Missingness, spikes, outliers, and unit problems matter before predictive strength does.

Then ask whether the variable is monotonic in risk

WoE binning should ideally create a risk-ordered pattern, not random jumps between bins.

Then ask whether the variable is informative

IV provides a compact summary of how much separation the variable contributes across bins.

Only then move to model entry

A variable with high apparent signal but poor stability, leakage risk, or non-interpretable binning can still be a bad candidate.

interactive · woe & iv

WoE & IV — the scorecard builder’s core tool

Choose a variable to see its bins, event rates, WoE values, and total IV. This is the fastest way to build intuition for strong, weak, and suspicious predictors.

WoE by bin

Positive WoE (lower risk) Negative WoE (higher risk)

Event rate by bin

Variable

WoE breakdown

Total IV

—

Predictive power

—

Bins

—

Monotonic WoE?

—

WoE_i = ln(%NonEvents_i / %Events_i)
IV = Σ (%NonEvents_i − %Events_i) × WoE_i

IV thresholds: < 0.02 useless · 0.02–0.10 weak · 0.10–0.30 medium · 0.30–0.50 strong · > 0.50 suspicious.

Monotonicity: WoE should ideally move consistently with risk. If it jumps up and down, either binning needs repair or the variable has a more complicated relationship than a basic scorecard likes. [oai_citation:0‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)

interactive · outlier detection

Outlier detection — three methods compared

Outliers can distort means, variances, and coefficients. But in risk modelling, extreme values are not automatically bad data. The right question is not only “is this extreme?” but also “is it economically real and model-relevant?”

Data with outlier flags

Regular values IQR method z-score modified z-score

Method comparison

Scenario

Extreme multiplier1.0

IQR flags

—

z-score flags

—

Modified z flags

—

Mean shift vs median

—

Risk caution: a very high LTV or very low income can be an outlier statistically and still be real economically. Flag first, then decide whether to cap, transform, bin separately, or keep as genuine signal. [oai_citation:1‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)

interactive · missing data

Missing data — not all gaps are equal

Missingness can be random, partially explainable, or directly informative. In scorecards, “missing” is often not something to hide — it can be a risk signal of its own.

Missingness mechanisms

MCAR

Missing Completely At Random. The gap is unrelated to observed or unobserved values.

MAR

Missing At Random. Missingness depends on other observed variables.

MNAR

Missing Not At Random. Missingness itself carries information about the hidden value or risk state.

Treatment methods

Method	MCAR	MAR	MNAR	Use case
Listwise deletion	Yes	No	No	Tiny missingness, MCAR confirmed
Mean / Median	Yes / Approx	No	No	Fast baseline, median for skewed
Mode	N/A	No	No	Categorical variables
Regression imputation	Yes	Partially	Yes	Other variables predict the missing one
Multiple imputation	Yes	Yes	Yes	Best general statistical treatment
Missing indicator	N/A	N/A	Partial	Missingness itself is informative
Separate WoE bin	N/A	N/A	Yes	Scorecard best practice

Scorecard best practice: create a dedicated WoE bin for missing values when missingness itself may carry risk. Missing bureau score often has a different default profile than any observed score bucket. [oai_citation:2‡09-data-preparation.html](sediment://file_00000000b608720a871374c5870ba1e1)

Common mistake: filling everything with mean or median can destroy real signal if “not available” is itself behaviourally informative.

reference

IV thresholds & binning rules

IV Range	Predictive power	Action	Typical variables
< 0.02	Useless	Drop	Noise, irrelevant fields
0.02 – 0.10	Weak	Keep only if business-critical	Dependents, minor demographics
0.10 – 0.30	Medium	Core candidate	Age, tenure, income, LTV
0.30 – 0.50	Strong	High-value predictor	Bureau score, utilisation, delinquency
> 0.50	Suspicious	Investigate leakage / overfit	Post-event variables, contaminated fields

Binning rules of thumb: enough observations per bin, economically sensible split points, monotonic WoE if possible, separate missing bin if informative, and no bins whose event rate is driven by tiny counts.

deeper concepts

Concepts every validator should keep

monotonicity

Good binning is ordered thinking

WoE bins should ideally form a risk story that moves in one direction. Random zig-zag binning is usually a sign the variable has not been prepared well enough.

leakage

Suspiciously strong can be bad

An IV above 0.5 can reflect genuine strength, but it can also mean leakage, post-event contamination, or an operational proxy for the target.

missingness

Missing can be predictive

Especially in credit data, “not available” may describe a borrower segment with its own behavioural risk profile.

outliers

Flag first, decide second

Do not automatically winsorise or delete extremes. First decide whether the extreme value is data error, rare truth, or a business-relevant tail segment.

stability

Preparation must survive time

A variable that looks great in development but drifts badly later will damage model governance. PSI and CSI thinking starts here, not after deployment. [oai_citation:3‡06-parametric-tests.html](sediment://file_00000000c174720a873c72d512b98176)

sample bias

Reject inference still exists

What you prepare is usually only accepted-book data. That means the data pipeline itself can already contain structural selection bias before modelling begins.

summary

What to leave this page with

Data preparation is not a pre-model chore. It is where the eventual model gets most of its structure, stability, and interpretability.

The useful order is: first clean the variable, then bin it sensibly, then convert to WoE, then judge it with IV, then ask whether the resulting feature is stable enough to enter the scorecard.

Once that mindset is clear, WoE and IV stop looking like legacy scorecard rituals and start looking like a disciplined preparation system.