notes · 16

Bayesian Approaches

Bayesian methods become most useful exactly where classical methods become most fragile: sparse defaults, low-default portfolios, weak statistical power, and a real need to incorporate prior knowledge instead of pretending the dataset is speaking alone.

Start with the frequentist vs Bayesian shift, then move into Beta-Binomial updating, prior sensitivity, and low-default portfolio estimation. The goal is to understand Bayesian PD not as an academic alternative, but as a practical tool for stabilising inference when data is thin.

Mindset Bayes Rule Beta-Binomial Prior Sensitivity LDP Estimation Reference Summary

the paradigm shift

Frequentist vs Bayesian thinking

Frequentist view

The parameter is fixed but unknown. Data is random. In PD estimation, the classical point estimate is just the observed default rate: d / n.

That is fine when default data is rich. It becomes unstable when defaults are rare. With 0 defaults, the MLE becomes 0%, which is almost never a sensible business conclusion.

PD_MLE = d / n

Bayesian view

The parameter itself is uncertain and is described with a probability distribution. You start with a prior belief, observe new data, and update to a posterior belief.

That means 0 defaults does not force PD to 0%. Prior information prevents the estimate from collapsing into nonsense when data is thin.

Posterior ∝ Likelihood × Prior

LDP motivation: sovereign, bank, and large-corporate portfolios often do not generate enough default observations for purely data-driven estimation. Bayesian methods are attractive precisely because they stabilise inference under sparse evidence. [oai_citation:2‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

the update rule

Prior → data → posterior

Prior

What you believed about PD before seeing this portfolio’s observed defaults.

The prior can come from external data, historical experience, regulatory benchmarks, rating studies, or expert judgement.

Likelihood

The evidence supplied by the observed data. In a default / non-default setting, this is usually the Binomial likelihood.

This is where the actual sample speaks.

Posterior

The updated distribution after combining prior belief with observed evidence.

It is a full distribution, not just a point estimate, which is why credible intervals come naturally.

P(θ | data) = P(data | θ) × P(θ) / P(data)

With Beta prior + Binomial likelihood:
Beta(α, β) + Binomial(d, n) → Beta(α + d, β + n − d)

learning sequence

A useful order for learning Bayesian PD

Start with the small-sample problem

Bayesian methods make the most sense once you see why d / n becomes unstable or meaningless under sparse defaults.

Then understand the prior as information, not magic

The prior is not a trick. It is a disciplined way to express pre-existing knowledge that the modeler is already using implicitly.

Then learn conjugate updating

Beta-Binomial is the cleanest entry point because the posterior has a simple closed form and the mechanics are fully visible.

Then stress the prior itself

Bayesian modelling is only credible if results are robust across reasonable priors and if prior strength is documented transparently.

interactive · beta-binomial

Watch the posterior update in real time

Beta-Binomial is the standard conjugate setup for PD estimation. Change the prior, add observed defaults, and watch the posterior mean, credible interval, and shrinkage behaviour move.

Prior, likelihood, and posterior

Prior Likelihood (scaled) Posterior

Posterior vs classical estimate

Prior choice

α1

β1

Observed data

n (exposures)100

d (defaults)2

Prior mean

—

Prior strength

—

MLE

—

Posterior mean

—

Posterior mode

—

Posterior std

—

95% credible interval

—

95% freq. CI

—

Data weight

—

Prior: Beta(α, β)
Posterior: Beta(α + d, β + n − d)
Posterior mean = (α + d) / (α + β + n)

Shrinkage: the posterior mean is a weighted compromise between the prior mean and the observed default rate. When n is small, the prior matters a lot. When n grows, the data takes over. [oai_citation:3‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

interactive · prior sensitivity

How much does the prior really matter?

Prior sensitivity is not optional. It is the discipline that prevents Bayesian modelling from becoming a black box for injecting preferred answers.

Three posteriors, same data

Uninformative Moderate Strong

Observed data

n50

Main lesson: when the sample is small, reasonable prior choices can move the posterior materially. When the sample is large, the posteriors converge. That convergence itself is evidence that the model is data-driven rather than prior-driven. [oai_citation:4‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

Validation expectation: if the result changes radically under plausible prior alternatives, you do not have a stable estimate — you have a prior-sensitive estimate that must be disclosed as such. [oai_citation:5‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

interactive · low-default portfolios

Bayesian updating across years in LDPs

Low-default portfolios are where Bayesian approaches stop being a statistical preference and become a practical necessity. The model updates year by year, with each posterior becoming the next prior.

Multi-year posterior mean and interval

Posterior mean 95% credible band Yearly MLE

Posterior evolution by year

LDP scenario

n per year150

Observation years5

Final posterior mean

—

Final 95% CI

—

Total defaults

—

Total exposures

—

Pooled MLE

—

CI width reduction

—

Zero-default case: classical estimation gives PD = 0%. Bayesian estimation does not. That single fact is one of the strongest intuitive arguments for Bayesian LDP methods. [oai_citation:6‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

Sequential monitoring: each year’s posterior becomes the next year’s prior. In that sense, Bayesian updating is naturally aligned with ongoing validation and monitoring cycles. [oai_citation:7‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

reference

Bayesian vs frequentist PD estimation

Aspect	Frequentist	Bayesian	LDP relevance
Point estimate	d / n	(α+d)/(α+β+n)	Bayesian avoids collapse to 0%
Uncertainty	Confidence interval	Credible interval	Bayesian interval is directly interpretable
0 defaults	PD = 0%	Prior-pulled non-zero PD	Major advantage
Prior information	Not explicit	Formally included	Useful with external evidence
Small samples	Unstable	Stabilised	Key for LDPs
Interpretation	Long-run sampling logic	Probability statement on parameter	Often easier for decision-makers

deeper concepts

Concepts every validator should keep

conjugacy

Why Beta-Binomial matters

It keeps the updating algebra transparent. That makes it ideal for teaching, documenting, and defending LDP estimation logic.

prior elicitation

The prior must come from somewhere real

External studies, historical analogues, expert judgement, and regulatory benchmarks are all possible sources, but each one must be justified.

prior strength

Equivalent sample size matters

α + β can be read as the prior’s effective sample size. Large prior strength means new data moves the posterior slowly.

point estimate choice

Mean, mode, median are not identical

Posterior mean is common, but conservative frameworks may focus on upper credible bounds rather than central estimates.

posterior predictive

Backtesting can also be Bayesian

Instead of only testing realised defaults against a fixed PD, you can evaluate observed outcomes against the posterior predictive distribution.

regulatory use

Bayesian does not mean unregulated

The method is acceptable for LDP contexts precisely because the prior is documented, sensitivity-tested, and not allowed to dominate without justification. [oai_citation:8‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

summary

What to leave this page with

Bayesian PD estimation is most valuable when the data is weakest. It replaces brittle point estimates with disciplined updating under uncertainty.

The useful order is: first understand why sparse defaults break classical intuition, then learn prior-likelihood-posterior updating, then study prior sensitivity, then apply the logic to low-default portfolios and year-by-year monitoring.

Once that structure is clear, Bayesian methods stop looking exotic and start looking like a practical answer to a very specific statistical problem.