notes · 20

Imbalanced Learning, Cutoff Strategy & Cost-Sensitive Decisions

In many credit and risk problems, the interesting class is the rare one. Defaults, frauds, severe delinquencies, and bad events are usually the minority. That means standard accuracy can look impressive while the model still fails the real business objective.

Start with why imbalance distorts naive evaluation, then move into threshold choice, precision-recall tradeoffs, and finally cost-sensitive decision rules where business losses, not generic metrics, define the right cutoff.

Mindset Imbalance Threshold Precision-Recall Cost-Sensitive Cutoff Reference Summary

the core trap

Accuracy can look good while the model is practically useless

Balanced intuition

In balanced datasets, a simple metric like accuracy can often be directionally informative because both classes have comparable weight.

But that intuition breaks once the bad class becomes rare.

Balanced classes make naive metrics less misleading.

Imbalanced reality

If default rate is 2%, a model that predicts “no default” for everyone already achieves 98% accuracy.

That is statistically neat but operationally worthless, because it never detects the cases that actually matter.

High accuracy can coexist with zero detection value.

Credit-risk relevance: in underwriting, collections prioritization, AML, or fraud, the rare event is often the business-critical event. Evaluation should therefore be aligned to event capture, loss avoidance, and decision cost — not to raw accuracy.

learning sequence

A useful order for learning cutoff strategy

Start with the class mix

Before selecting a threshold, understand how rare the bad class is. Imbalance changes what all downstream metrics mean.

Then separate ranking from decision

A model score ranks cases. A threshold converts that ranking into an action rule. Those are different layers.

Then inspect precision-recall tradeoffs

As the threshold moves, captured bads and false alarms move together. That tradeoff is usually more informative than accuracy alone.

Then anchor the threshold to cost

The best cutoff is rarely the one with the best generic statistic. It is usually the one that minimizes expected business loss.

interactive · imbalance effect

What class imbalance does to basic metrics

Move the bad rate and compare a naive always-negative classifier with a more realistic scoring model. Watch how accuracy can stay high while recall collapses.

Metric comparison under class imbalance

Naive classifier Scoring model

Confusion matrix snapshot

Controls

Bad rate (%)5

Model quality0.75

Naive accuracy

—

Naive recall

—

Naive precision

—

Model accuracy

—

Model recall

—

Model F1

—

Main lesson: as the bad class becomes rarer, accuracy becomes more dominated by the majority class and therefore less useful as a business metric.

interactive · threshold strategy

The threshold is a business decision rule

The score itself is continuous. The threshold turns it into approve / reject, intervene / ignore, escalate / pass. Move the threshold and watch how the decision system changes.

Metrics across threshold

Precision Recall F1

Decision volume view

Controls

Threshold0.50

Base bad rate (%)8

Model strength0.78

Precision

—

Recall

—

Specificity

—

Rejected / flagged share

—

Interpretation: lower thresholds catch more bads but also produce more false positives. Higher thresholds conserve approvals but miss more bads.

Governance reminder: there is no universally correct threshold. The threshold must reflect portfolio strategy, risk appetite, and intervention capacity.

interactive · precision-recall

Why precision-recall becomes more useful under imbalance

ROC is still useful, but when the positive class is rare, precision-recall often becomes more decision-relevant because it directly shows event capture versus alert quality.

Precision-Recall curve

Scenario

PR AUC

—

Baseline precision

—

Best F1 threshold

—

Comment

—

Baseline precision: under severe imbalance, baseline precision is just the event rate. That is why PR curves are harsher and often more realistic than ROC curves in rare-event settings.

interactive · cost-sensitive cutoff

The best cutoff is often the one with the lowest expected loss

Assign a cost to false approvals and false rejections, then watch the optimal threshold move. This is the bridge from statistics to actual business policy.

Expected cost by threshold

Total expected cost Best threshold

Cost composition at selected threshold

Controls

Cost of false negative10

Cost of false positive2

Bad rate (%)6

Model quality0.80

Best threshold

—

Min expected cost

—

Approval share

FN contribution

—

FP contribution

—

Policy hint

—

Expected Cost = c(FN)·FN + c(FP)·FP

Core message: if missing a bad account is much more costly than wrongly rejecting a good one, the optimal threshold should move downward.

reference

Metrics and decision views compared

View	Main question	Useful when	Main limitation
Accuracy	How many decisions are correct overall?	Balanced classes	Misleading under imbalance
Recall / Sensitivity	How many bads are captured?	Missing bads is costly	Can explode false positives
Precision	How many flagged cases are truly bad?	Intervention capacity is limited	Can miss many bads
F1 score	How to balance precision and recall?	Symmetric emphasis on capture and quality	Still not cost-aware
PR Curve	How does capture trade off against alert quality?	Rare-event problems	Threshold-free but still not cost-based
Expected cost	Which cutoff minimizes business loss?	When costs are known or approximated	Depends on cost assumptions

deeper concepts

Concepts every validator should keep

imbalance

Rare events change metric meaning

Once the bad class becomes rare, many familiar metrics stop carrying the intuitive meaning they had in balanced settings.

ranking vs action

The score is not the policy

A model score orders cases. A threshold turns that ordering into a business action. Those should not be conflated.

capacity

Operational limits matter

A theoretically attractive threshold may be useless if the business cannot review, reject, or manually inspect that many cases.

cost

Not all errors are equal

False approval of a bad account and false rejection of a good account rarely have symmetric business consequences.

strategy

The optimal threshold can change over time

If risk appetite, funding constraints, collection capacity, or macro conditions shift, the threshold may need to shift too.

governance

The cutoff must be defended explicitly

A production threshold should be documented as a policy choice tied to business objectives, not as an arbitrary technical default.

summary

What to leave this page with

Imbalanced learning is not mainly about fixing the model. It is about evaluating and using the model correctly when the important class is rare.

The useful order is: first understand why imbalance breaks naive accuracy, then inspect threshold effects on precision and recall, then use PR logic for rare-event interpretation, then choose the operational cutoff through expected cost and business capacity.

Once that structure is clear, threshold choice stops looking like a small technical setting and starts looking like a core decision policy.