Segmentation, Reject Inference & Selection Bias
A model is only as representative as the population it learns from. In credit modelling, observed performance can look clean while hiding a deeper issue: the development sample is often filtered by prior decisions, acceptance rules, and portfolio mix.
A model can be valid on the observed sample and wrong on the real population
Observed portfolio logic
The model is usually trained and validated on accepted applicants or booked exposures, because those are the cases with realised outcomes.
That sounds practical, but it also means the sample has already been filtered by an earlier decision process.
Selection bias logic
If acceptance depends on the same risk factors that also drive default, the observed sample becomes systematically different from the true underlying population.
That difference can distort PD levels, variable relationships, segment stability, and even measured model performance.
A useful order for learning this topic
Start with segmentation
Before worrying about reject inference, first check whether model behaviour is stable across age bands, products, channels, industries, or grades.
Then ask who is missing from the sample
Selection bias starts when the observed population excludes cases systematically, not randomly.
Then move to reject inference
Once you understand the missing-data mechanism, reject inference becomes a practical attempt to recover something about unseen risk.
Then test representativeness continuously
The question is not only whether the development sample was biased, but whether the production population remains aligned with the intended use population.
Overall performance can hide weak segments
A single portfolio-level AUC can look acceptable while important subsegments perform much worse. Use the controls below to see how portfolio mix can mask subgroup degradation.
Acceptance filters distort the observed sample
Tighten the acceptance cutoff and watch how the booked sample becomes safer, less representative, and more detached from the underlying applicant population.
What could rejected applicants have looked like?
Reject inference methods try to say something about missing outcomes. None of them fully solves the problem, but some are less naive than simply ignoring rejects.
Estimated default rate under different reject treatments
Scenario
Is the production population still similar to the development population?
Development bias is one issue. Population drift is another. This section combines both ideas by showing whether the current production mix still resembles what the model was originally trained on.
Concepts compared
| Concept | Main question | Typical symptom | Main risk |
|---|---|---|---|
| Segmentation analysis | Does the model behave similarly across subgroups? | Weak segment hidden in strong total | Local model failure |
| Selection bias | Is the observed sample systematically filtered? | Booked sample safer than applicant pool | Biased parameter estimates |
| Reject inference | Can missing rejected outcomes be approximated? | Different results under different assumptions | False precision |
| Representativeness drift | Does production still resemble development? | Segment mix changes materially | Model not used on intended population |
| Strategy drift | Did policy or underwriting target change? | Acceptance rules shift score distribution | Historical calibration no longer portable |
Concepts every validator should keep
Total performance is not enough
A strong overall metric can hide a weak but strategically important segment.
Rejected cases are not just “missing data”
They are usually missing in a non-random way, which makes the problem more structural than ordinary incompleteness.
Acceptance policy shapes the data
The model is often learning on outcomes that were already filtered by previous policy choices.
Challenger comparisons can be biased too
If champion and challenger are evaluated on a filtered booked sample, both may look better than they truly are on the full application universe.
A business shift can create instant model mismatch
New channels, products, or risk appetite changes can invalidate historical segment relationships very quickly.
Assumptions must be explicit
Any use of reject inference or augmented representativeness logic should be presented as assumption-based, not as directly observed truth.
What to leave this page with
Segmentation and selection are not side issues around model validation. They are part of the data-generating process itself.
The useful order is: first test subgroup performance, then understand how acceptance rules filter the observable sample, then treat reject inference as assumption-based reconstruction, then monitor whether the live production population still resembles the population the model was built for.
Once that structure is clear, model validation stops being only about coefficients and metrics and becomes a broader question of whether the model is being learned and used on the right population at all.