← ds learning track
notes · 16

Bayesian Approaches

Bayesian methods become most useful exactly where classical methods become most fragile: sparse defaults, low-default portfolios, weak statistical power, and a real need to incorporate prior knowledge instead of pretending the dataset is speaking alone.

Start with the frequentist vs Bayesian shift, then move into Beta-Binomial updating, prior sensitivity, and low-default portfolio estimation. The goal is to understand Bayesian PD not as an academic alternative, but as a practical tool for stabilising inference when data is thin.

Frequentist vs Bayesian thinking

Frequentist view

The parameter is fixed but unknown. Data is random. In PD estimation, the classical point estimate is just the observed default rate: d / n.

That is fine when default data is rich. It becomes unstable when defaults are rare. With 0 defaults, the MLE becomes 0%, which is almost never a sensible business conclusion.

PDMLE = d / n

Bayesian view

The parameter itself is uncertain and is described with a probability distribution. You start with a prior belief, observe new data, and update to a posterior belief.

That means 0 defaults does not force PD to 0%. Prior information prevents the estimate from collapsing into nonsense when data is thin.

Posterior ∝ Likelihood × Prior
LDP motivation: sovereign, bank, and large-corporate portfolios often do not generate enough default observations for purely data-driven estimation. Bayesian methods are attractive precisely because they stabilise inference under sparse evidence. [oai_citation:2‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

Prior → data → posterior

Prior

What you believed about PD before seeing this portfolio’s observed defaults.

The prior can come from external data, historical experience, regulatory benchmarks, rating studies, or expert judgement.

Likelihood

The evidence supplied by the observed data. In a default / non-default setting, this is usually the Binomial likelihood.

This is where the actual sample speaks.

Posterior

The updated distribution after combining prior belief with observed evidence.

It is a full distribution, not just a point estimate, which is why credible intervals come naturally.

P(θ | data) = P(data | θ) × P(θ) / P(data)

With Beta prior + Binomial likelihood:
Beta(α, β) + Binomial(d, n) → Beta(α + d, β + n − d)

A useful order for learning Bayesian PD

01

Start with the small-sample problem

Bayesian methods make the most sense once you see why d / n becomes unstable or meaningless under sparse defaults.

02

Then understand the prior as information, not magic

The prior is not a trick. It is a disciplined way to express pre-existing knowledge that the modeler is already using implicitly.

03

Then learn conjugate updating

Beta-Binomial is the cleanest entry point because the posterior has a simple closed form and the mechanics are fully visible.

04

Then stress the prior itself

Bayesian modelling is only credible if results are robust across reasonable priors and if prior strength is documented transparently.

Watch the posterior update in real time

Beta-Binomial is the standard conjugate setup for PD estimation. Change the prior, add observed defaults, and watch the posterior mean, credible interval, and shrinkage behaviour move.

How much does the prior really matter?

Prior sensitivity is not optional. It is the discipline that prevents Bayesian modelling from becoming a black box for injecting preferred answers.

Bayesian updating across years in LDPs

Low-default portfolios are where Bayesian approaches stop being a statistical preference and become a practical necessity. The model updates year by year, with each posterior becoming the next prior.

Bayesian vs frequentist PD estimation

Aspect Frequentist Bayesian LDP relevance
Point estimated / n(α+d)/(α+β+n)Bayesian avoids collapse to 0%
UncertaintyConfidence intervalCredible intervalBayesian interval is directly interpretable
0 defaultsPD = 0%Prior-pulled non-zero PDMajor advantage
Prior informationNot explicitFormally includedUseful with external evidence
Small samplesUnstableStabilisedKey for LDPs
InterpretationLong-run sampling logicProbability statement on parameterOften easier for decision-makers

Concepts every validator should keep

conjugacy

Why Beta-Binomial matters

It keeps the updating algebra transparent. That makes it ideal for teaching, documenting, and defending LDP estimation logic.

prior elicitation

The prior must come from somewhere real

External studies, historical analogues, expert judgement, and regulatory benchmarks are all possible sources, but each one must be justified.

prior strength

Equivalent sample size matters

α + β can be read as the prior’s effective sample size. Large prior strength means new data moves the posterior slowly.

point estimate choice

Mean, mode, median are not identical

Posterior mean is common, but conservative frameworks may focus on upper credible bounds rather than central estimates.

posterior predictive

Backtesting can also be Bayesian

Instead of only testing realised defaults against a fixed PD, you can evaluate observed outcomes against the posterior predictive distribution.

regulatory use

Bayesian does not mean unregulated

The method is acceptable for LDP contexts precisely because the prior is documented, sensitivity-tested, and not allowed to dominate without justification. [oai_citation:8‡15-bayesian-approaches.html](sediment://file_00000000fbec720ab524575ea0ec6a4f)

What to leave this page with

Bayesian PD estimation is most valuable when the data is weakest. It replaces brittle point estimates with disciplined updating under uncertainty.

The useful order is: first understand why sparse defaults break classical intuition, then learn prior-likelihood-posterior updating, then study prior sensitivity, then apply the logic to low-default portfolios and year-by-year monitoring.

Once that structure is clear, Bayesian methods stop looking exotic and start looking like a practical answer to a very specific statistical problem.