WoE, IV & Data Preparation
Before a scorecard can learn, the data has to speak the right language. WoE transforms raw predictors into risk-ordered values, IV tells you which variables are worth keeping, and data preparation determines whether the model is learning signal or noise.
How scorecard data gets model-ready
1. Clean
Handle missing values, detect outliers, fix data types, remove duplicates, and make sure the observation window is consistent.
This is where a large part of modelling time actually goes. Raw data almost never arrives in a scorecard-ready state.
2. Transform
Bin continuous variables into interpretable groups, create a dedicated missing bin where relevant, and convert bins into WoE.
This turns raw predictors into a monotonic, logistic-friendly scale that is easier to govern.
3. Select
Rank predictors by IV, remove weak variables, check monotonicity, and test collinearity among survivors.
The final set should not only be predictive, but also stable and defendable.
A useful order for learning data preparation
First ask whether the raw variable is usable
Missingness, spikes, outliers, and unit problems matter before predictive strength does.
Then ask whether the variable is monotonic in risk
WoE binning should ideally create a risk-ordered pattern, not random jumps between bins.
Then ask whether the variable is informative
IV provides a compact summary of how much separation the variable contributes across bins.
Only then move to model entry
A variable with high apparent signal but poor stability, leakage risk, or non-interpretable binning can still be a bad candidate.
WoE & IV — the scorecard builder’s core tool
Choose a variable to see its bins, event rates, WoE values, and total IV. This is the fastest way to build intuition for strong, weak, and suspicious predictors.
Outlier detection — three methods compared
Outliers can distort means, variances, and coefficients. But in risk modelling, extreme values are not automatically bad data. The right question is not only “is this extreme?” but also “is it economically real and model-relevant?”
Missing data — not all gaps are equal
Missingness can be random, partially explainable, or directly informative. In scorecards, “missing” is often not something to hide — it can be a risk signal of its own.
Missingness mechanisms
MCAR
Missing Completely At Random. The gap is unrelated to observed or unobserved values.
MAR
Missing At Random. Missingness depends on other observed variables.
MNAR
Missing Not At Random. Missingness itself carries information about the hidden value or risk state.
Treatment methods
| Method | MCAR | MAR | MNAR | Use case |
|---|---|---|---|---|
| Listwise deletion | Yes | No | No | Tiny missingness, MCAR confirmed |
| Mean / Median | Yes / Approx | No | No | Fast baseline, median for skewed |
| Mode | N/A | No | No | Categorical variables |
| Regression imputation | Yes | Partially | Yes | Other variables predict the missing one |
| Multiple imputation | Yes | Yes | Yes | Best general statistical treatment |
| Missing indicator | N/A | N/A | Partial | Missingness itself is informative |
| Separate WoE bin | N/A | N/A | Yes | Scorecard best practice |
IV thresholds & binning rules
| IV Range | Predictive power | Action | Typical variables |
|---|---|---|---|
| < 0.02 | Useless | Drop | Noise, irrelevant fields |
| 0.02 – 0.10 | Weak | Keep only if business-critical | Dependents, minor demographics |
| 0.10 – 0.30 | Medium | Core candidate | Age, tenure, income, LTV |
| 0.30 – 0.50 | Strong | High-value predictor | Bureau score, utilisation, delinquency |
| > 0.50 | Suspicious | Investigate leakage / overfit | Post-event variables, contaminated fields |
Concepts every validator should keep
Good binning is ordered thinking
WoE bins should ideally form a risk story that moves in one direction. Random zig-zag binning is usually a sign the variable has not been prepared well enough.
Suspiciously strong can be bad
An IV above 0.5 can reflect genuine strength, but it can also mean leakage, post-event contamination, or an operational proxy for the target.
Missing can be predictive
Especially in credit data, “not available” may describe a borrower segment with its own behavioural risk profile.
Flag first, decide second
Do not automatically winsorise or delete extremes. First decide whether the extreme value is data error, rare truth, or a business-relevant tail segment.
Preparation must survive time
A variable that looks great in development but drifts badly later will damage model governance. PSI and CSI thinking starts here, not after deployment. [oai_citation:3‡06-parametric-tests.html](sediment://file_00000000c174720a873c72d512b98176)
Reject inference still exists
What you prepare is usually only accepted-book data. That means the data pipeline itself can already contain structural selection bias before modelling begins.
What to leave this page with
Data preparation is not a pre-model chore. It is where the eventual model gets most of its structure, stability, and interpretability.
The useful order is: first clean the variable, then bin it sensibly, then convert to WoE, then judge it with IV, then ask whether the resulting feature is stable enough to enter the scorecard.
Once that mindset is clear, WoE and IV stop looking like legacy scorecard rituals and start looking like a disciplined preparation system.