notes · 19

Segmentation, Reject Inference & Selection Bias

A model is only as representative as the population it learns from. In credit modelling, observed performance can look clean while hiding a deeper issue: the development sample is often filtered by prior decisions, acceptance rules, and portfolio mix.

Start with segmentation and subgroup performance, then move into acceptance bias and reject inference, then connect that logic to representativeness, sample selection bias, and model validity under filtered populations.

Mindset Segment Performance Selection Bias Reject Inference Representativeness Reference Summary

the hidden problem

A model can be valid on the observed sample and wrong on the real population

Observed portfolio logic

The model is usually trained and validated on accepted applicants or booked exposures, because those are the cases with realised outcomes.

That sounds practical, but it also means the sample has already been filtered by an earlier decision process.

Observed sample ≠ full applicant universe

Selection bias logic

If acceptance depends on the same risk factors that also drive default, the observed sample becomes systematically different from the true underlying population.

That difference can distort PD levels, variable relationships, segment stability, and even measured model performance.

Filtering changes the data-generating process

Main warning: good validation on booked customers does not automatically prove the model is valid for all applicants, rejected cases, or future strategy shifts.

learning sequence

A useful order for learning this topic

Start with segmentation

Before worrying about reject inference, first check whether model behaviour is stable across age bands, products, channels, industries, or grades.

Then ask who is missing from the sample

Selection bias starts when the observed population excludes cases systematically, not randomly.

Then move to reject inference

Once you understand the missing-data mechanism, reject inference becomes a practical attempt to recover something about unseen risk.

Then test representativeness continuously

The question is not only whether the development sample was biased, but whether the production population remains aligned with the intended use population.

interactive · segment performance

Overall performance can hide weak segments

A single portfolio-level AUC can look acceptable while important subsegments perform much worse. Use the controls below to see how portfolio mix can mask subgroup degradation.

Segment-level performance

AUC by segment Portfolio average

Segment detail

Scenario

Weight of weakest segment (%)10

Portfolio AUC

—

Worst segment AUC

—

Gap

—

Governance hint

—

Main lesson: portfolio averages are weighted averages. A weak segment can disappear inside a strong total if its volume is small enough.

Validation consequence: any segment that is strategically important, high-risk, or growing fast should be reviewed separately even if its current weight is small.

interactive · selection bias

Acceptance filters distort the observed sample

Tighten the acceptance cutoff and watch how the booked sample becomes safer, less representative, and more detached from the underlying applicant population.

Applicant vs accepted population

Full applicant pool Accepted sample

Observed default rate shift

Controls

Acceptance cutoff0.60

Underlying population risk1.00

Acceptance rate

—

Applicant DR

—

Accepted DR

—

Mean score shift

—

Observed bias

—

Risk of distortion

—

Observed booked sample = Applicant pool ∩ acceptance policy

Selection mechanism: if acceptance depends on score, income, DTI, or bureau quality, then the observed booked sample is not a neutral slice of the population.

interactive · reject inference

What could rejected applicants have looked like?

Reject inference methods try to say something about missing outcomes. None of them fully solves the problem, but some are less naive than simply ignoring rejects.

Estimated default rate under different reject treatments

Scenario

Reject share (%)

Relative reject risk multiplier1.60

Methods shown: booked-only view, hard cutoff inference, simple augmentation, and fuzzy weighting. None is “truth”; each is a structured assumption.

Main governance point: reject inference is not magic recovery of hidden outcomes. It is assumption-driven reconstruction and must be documented that way.

interactive · representativeness

Is the production population still similar to the development population?

Development bias is one issue. Population drift is another. This section combines both ideas by showing whether the current production mix still resembles what the model was originally trained on.

Segment mix: development vs current production

Development mix Current mix

Risk-weighted mix distortion

Scenario

Mix PSI

—

Weighted risk drift

—

Max segment shift

—

Representativeness

—

Main lesson: a model can remain mathematically unchanged while the business it serves changes around it.

reference

Concepts compared

Concept	Main question	Typical symptom	Main risk
Segmentation analysis	Does the model behave similarly across subgroups?	Weak segment hidden in strong total	Local model failure
Selection bias	Is the observed sample systematically filtered?	Booked sample safer than applicant pool	Biased parameter estimates
Reject inference	Can missing rejected outcomes be approximated?	Different results under different assumptions	False precision
Representativeness drift	Does production still resemble development?	Segment mix changes materially	Model not used on intended population
Strategy drift	Did policy or underwriting target change?	Acceptance rules shift score distribution	Historical calibration no longer portable

deeper concepts

Concepts every validator should keep

portfolio average

Total performance is not enough

A strong overall metric can hide a weak but strategically important segment.

missing outcomes

Rejected cases are not just “missing data”

They are usually missing in a non-random way, which makes the problem more structural than ordinary incompleteness.

policy dependence

Acceptance policy shapes the data

The model is often learning on outcomes that were already filtered by previous policy choices.

fair comparison

Challenger comparisons can be biased too

If champion and challenger are evaluated on a filtered booked sample, both may look better than they truly are on the full application universe.

strategy change

A business shift can create instant model mismatch

New channels, products, or risk appetite changes can invalidate historical segment relationships very quickly.

governance

Assumptions must be explicit

Any use of reject inference or augmented representativeness logic should be presented as assumption-based, not as directly observed truth.

summary

What to leave this page with

Segmentation and selection are not side issues around model validation. They are part of the data-generating process itself.

The useful order is: first test subgroup performance, then understand how acceptance rules filter the observable sample, then treat reject inference as assumption-based reconstruction, then monitor whether the live production population still resembles the population the model was built for.

Once that structure is clear, model validation stops being only about coefficients and metrics and becomes a broader question of whether the model is being learned and used on the right population at all.