← ds learning track
notes · 19

Segmentation, Reject Inference & Selection Bias

A model is only as representative as the population it learns from. In credit modelling, observed performance can look clean while hiding a deeper issue: the development sample is often filtered by prior decisions, acceptance rules, and portfolio mix.

Start with segmentation and subgroup performance, then move into acceptance bias and reject inference, then connect that logic to representativeness, sample selection bias, and model validity under filtered populations.

A model can be valid on the observed sample and wrong on the real population

Observed portfolio logic

The model is usually trained and validated on accepted applicants or booked exposures, because those are the cases with realised outcomes.

That sounds practical, but it also means the sample has already been filtered by an earlier decision process.

Observed sample ≠ full applicant universe

Selection bias logic

If acceptance depends on the same risk factors that also drive default, the observed sample becomes systematically different from the true underlying population.

That difference can distort PD levels, variable relationships, segment stability, and even measured model performance.

Filtering changes the data-generating process
Main warning: good validation on booked customers does not automatically prove the model is valid for all applicants, rejected cases, or future strategy shifts.

A useful order for learning this topic

01

Start with segmentation

Before worrying about reject inference, first check whether model behaviour is stable across age bands, products, channels, industries, or grades.

02

Then ask who is missing from the sample

Selection bias starts when the observed population excludes cases systematically, not randomly.

03

Then move to reject inference

Once you understand the missing-data mechanism, reject inference becomes a practical attempt to recover something about unseen risk.

04

Then test representativeness continuously

The question is not only whether the development sample was biased, but whether the production population remains aligned with the intended use population.

Overall performance can hide weak segments

A single portfolio-level AUC can look acceptable while important subsegments perform much worse. Use the controls below to see how portfolio mix can mask subgroup degradation.

Acceptance filters distort the observed sample

Tighten the acceptance cutoff and watch how the booked sample becomes safer, less representative, and more detached from the underlying applicant population.

What could rejected applicants have looked like?

Reject inference methods try to say something about missing outcomes. None of them fully solves the problem, but some are less naive than simply ignoring rejects.

Estimated default rate under different reject treatments

Scenario

Reject share (%)35
Relative reject risk multiplier1.60
Methods shown: booked-only view, hard cutoff inference, simple augmentation, and fuzzy weighting. None is “truth”; each is a structured assumption.
Main governance point: reject inference is not magic recovery of hidden outcomes. It is assumption-driven reconstruction and must be documented that way.

Is the production population still similar to the development population?

Development bias is one issue. Population drift is another. This section combines both ideas by showing whether the current production mix still resembles what the model was originally trained on.

Concepts compared

Concept Main question Typical symptom Main risk
Segmentation analysisDoes the model behave similarly across subgroups?Weak segment hidden in strong totalLocal model failure
Selection biasIs the observed sample systematically filtered?Booked sample safer than applicant poolBiased parameter estimates
Reject inferenceCan missing rejected outcomes be approximated?Different results under different assumptionsFalse precision
Representativeness driftDoes production still resemble development?Segment mix changes materiallyModel not used on intended population
Strategy driftDid policy or underwriting target change?Acceptance rules shift score distributionHistorical calibration no longer portable

Concepts every validator should keep

portfolio average

Total performance is not enough

A strong overall metric can hide a weak but strategically important segment.

missing outcomes

Rejected cases are not just “missing data”

They are usually missing in a non-random way, which makes the problem more structural than ordinary incompleteness.

policy dependence

Acceptance policy shapes the data

The model is often learning on outcomes that were already filtered by previous policy choices.

fair comparison

Challenger comparisons can be biased too

If champion and challenger are evaluated on a filtered booked sample, both may look better than they truly are on the full application universe.

strategy change

A business shift can create instant model mismatch

New channels, products, or risk appetite changes can invalidate historical segment relationships very quickly.

governance

Assumptions must be explicit

Any use of reject inference or augmented representativeness logic should be presented as assumption-based, not as directly observed truth.

What to leave this page with

Segmentation and selection are not side issues around model validation. They are part of the data-generating process itself.

The useful order is: first test subgroup performance, then understand how acceptance rules filter the observable sample, then treat reject inference as assumption-based reconstruction, then monitor whether the live production population still resembles the population the model was built for.

Once that structure is clear, model validation stops being only about coefficients and metrics and becomes a broader question of whether the model is being learned and used on the right population at all.