← ds learning track
notes · 08

Distributions

A distribution is one of the first places where probability becomes visual. This page is built to help you move from definition to intuition: what a distribution is, how families differ, how parameters change shape, and why those choices matter in modelling.

Start from the idea, then move through families, then use the explorer. The goal is not memorising formulas — it is building shape intuition.

What a distribution actually tells you

A distribution is the full description of uncertainty for a variable. It tells you what values are possible, how likely they are, and what overall shape those possibilities create.

The simple version

Imagine observing the same random process again and again. A distribution is the shape that repeated outcomes settle into.

It is not just a single number. It is the entire structure behind the numbers: centre, spread, asymmetry, tails, and support.

Distribution = the geometry of uncertainty.

In practice, this means distributions help you answer questions like: where do outcomes usually land, how variable are they, and how much mass sits in extreme outcomes?

Why this matters before modelling

Before fitting a model, you should ask what kind of variable you are dealing with. Is it bounded between 0 and 1? Is it strictly positive? Is it a count? Is it symmetric or heavily skewed?

Distributional thinking is what stops modelling from becoming mechanical. It makes you ask whether the mathematical family matches the data-generating behaviour.

Continuous vs discrete families

The first useful distinction is not Normal vs Binomial. It is continuous vs discrete. That single split already tells you what kind of object you are modelling.

Continuous distributions

Values move across an interval or the real line. We usually describe them with a PDF.
Normal Log-Normal Beta Exponential Student-t Uniform

Good for measurements, rates, proportions, times, losses, positive severities, and anything that is not naturally a count.

Discrete distributions

Values are countable: 0, 1, 2, 3, … We usually describe them with a PMF.
Bernoulli Binomial Poisson Geometric Negative Binomial

Good for counts, events, successes across trials, and rare-frequency style questions.

How to read the shape of a distribution

Before naming the family, learn to look at shape. These are the first four things worth checking.

1

Centre

Where does the mass live? Mean, median, and mode are three different ways of describing the typical region.

2

Spread

How far do values wander from the centre? Variance and standard deviation make the width visible.

3

Skew

Is one side longer or heavier than the other? Right-skew and left-skew already narrow the list of plausible families.

4

Tails

How much probability sits far from the centre? Heavy tails matter because models often fail in the extremes.

5

Support

What values are even possible? Support is often the simplest clue: counts, only positive values, or bounded values.

6

Parameter sensitivity

Small parameter changes can produce very different shapes. Understanding that movement is the real point of the explorer below.

A useful way to learn distributions

01

Start with support

Ask what values are possible. Real line? Positive only? 0 to 1? Integer counts? This immediately rules families in or out.

02

Then look at shape

Check symmetry, skew, spread, and tail behaviour. This is where Normal, Log-Normal, Beta, and Student-t begin to separate.

03

Then touch the parameters

Use sliders. Watch what happens when you move μ, σ, α, β, λ, or ν. Intuition comes from moving the shape, not only reading the formula.

04

Only then care about formulas

Formulas matter, but they become much easier once you already know what the family is trying to do geometrically.

Move the parameters and watch the shape change

Use the tabs below as a guided progression. Start with Normal, then move to positive skew (Log-Normal), bounded space (Beta), waiting-time logic (Exponential), heavy tails (Student-t), and finally discrete counts (Binomial / Poisson).

Normal is the cleanest starting point: symmetric, centred, and easy to standardise. Use it to understand what “mean moves location” and “standard deviation changes width” really look like.

Parameters

μ (mean)0
σ (std dev)1
f(x) = (1 / σ√2π) · e−(x−μ)² / 2σ²

What to notice

Mean0
Variance1
Skewness0
Kurtosis3
Move μ left and right: the whole bell shifts. Increase σ: the bell spreads and flattens.

Parameters

μ (log-mean)0
σ (log-std)0.5
f(x) = (1 / xσ√2π) · e−(ln x−μ)² / 2σ², x > 0

What to notice

Mean1.133
Median1.000
Skewnesspositive
This is a “positive only” family. As σ grows, the right tail stretches hard.

Parameters

α (alpha)2
β (beta)5
f(x) = xα−1(1−x)β−1 / B(α,β), x ∈ [0,1]

What to notice

Mean0.286
Mode0.200
Variance0.026
Beta is one of the best examples of “bounded shape flexibility.” It can be flat, U-shaped, left-skewed, right-skewed, or mound-shaped.

Parameters

λ (rate)1
f(x) = λ · e−λx, x ≥ 0

What to notice

Mean1.000
Variance1.000
MemorylessYes
As λ rises, the curve collapses toward zero. This is a good family for “waiting time” intuition.

Parameters

ν (degrees of freedom)3
Lower ν = heavier tails. Higher ν = closer to Normal.

What to notice

Mean0 (ν > 1)
Variance3.000
Normal limitas ν → ∞
Student-t is a good lesson in tail risk: the centre can look familiar while extremes behave very differently.

Parameters

n (trials)20
p (probability)0.3
P(X=k) = C(n,k) · pk · (1−p)n−k

What to notice

Mean6.0
Variance4.2
Std Dev2.05
This is the cleanest way to think about “number of successes out of n attempts.”

Parameters

λ (expected count)5
P(X=k) = (λk · e−λ) / k!

What to notice

Mean5.0
Variance5.0
Mean = VarianceAlways
Poisson is the rare-event count family. It becomes more symmetric as λ grows.

Parameters

a (min)0
b (max)5
f(x) = 1 / (b−a), for a ≤ x ≤ b

What to notice

Mean2.500
Variance2.083
Entropyflat
Uniform is useful as a baseline: equal weight across an interval, no preference inside the support.

Overlay multiple families on one chart

Comparison makes intuition sharper. Turn families on and off to compare symmetry, boundedness, skew, and tail thickness.

Toggle overlay

Where distributions show up in risk and modelling

Distributions are not just exam material. They are modelling choices. They shape what kinds of outcomes your model can represent.

Distribution Type Support Shape intuition Typical use
Normal Continuous (−∞, +∞) Symmetric, bell-shaped, light tails Standardisation, latent-variable thinking, CLT approximations
Log-Normal Continuous (0, +∞) Positive, right-skewed Positive severity-type variables, skewed magnitudes
Beta Continuous [0,1] Flexible bounded shape Recovery rates, proportions, bounded probabilities
Exponential Continuous [0,+∞) Fast decay, memoryless Waiting time intuition, hazard-style reasoning
Student-t Continuous (−∞,+∞) Heavier tails than Normal Small-sample caution, extreme outcome sensitivity
Binomial Discrete {0,1,…,n} Count of successes Simple default-count thinking across fixed trials
Poisson Discrete {0,1,2,…} Rare-event frequency Event counts, low-frequency environments
Uniform Continuous [a,b] Flat support, no preferred region Baseline simulation logic, simple uncertainty bounds

Key concepts worth keeping

PDF vs PMF

Density vs mass

Continuous variables use density. Discrete variables assign mass to exact outcomes. Same modelling idea, different mathematical object.

CDF

Cumulative probability

The CDF asks how much probability has accumulated by the time you reach x. It is the “at or below x” view.

Support

Possible values matter

The support of the variable often tells you more than the formula. Counts, bounded variables, and positive-only variables need different families.

Moments

Mean, variance, skew, tails

These are compact summaries of shape. They help compare families without reading every point on the curve.

MLE

Fitting a family

Maximum likelihood estimation asks which parameter values make the observed data most plausible under the assumed family.

Model risk

Wrong family, wrong story

If the distributional assumption is wrong, tail behaviour, calibration, uncertainty estimates, and decisions built on them can all drift off-course.

What to leave this page with

A distribution is not just a formula. It is a claim about how uncertainty is structured.

The most useful learning path is: first understand support, then read shape, then move parameters, then connect the family to a modelling use case.

Once that becomes intuitive, the formulas stop looking abstract — they start looking like compressed descriptions of behaviour.