← ds learning track
notes · 13

Parametric & Non-Parametric Tests

Once hypothesis testing is clear, the next practical question is which test to use. This page is about that decision: t-test, ANOVA, Chi-Square, Mann-Whitney, and the logic for choosing between parametric and non-parametric tools.

Start with the distinction, then the test-selection flow, then work through the main tests one by one. The goal is not memorising names — it is learning which question each test actually answers.

Parametric vs non-parametric

The real distinction is not “old tests vs new tests.” It is “tests that rely on stronger assumptions” versus “tests that sacrifice power in exchange for robustness.”

Parametric tests

Parametric tests assume a particular structure for the data, usually some combination of Normality, equal variances, continuity, and independence.

When those assumptions are acceptable, they are usually more powerful because they use more information from the data.

Typical examples: t-test, ANOVA, Pearson correlation, z-test.

They are often the first choice in clean experimental data, but they become fragile if shape assumptions are badly violated.

Non-parametric tests

Non-parametric tests avoid strong distributional assumptions. They often work on ranks or frequencies instead of raw values.

They are less efficient when parametric assumptions truly hold, but safer when the data is skewed, bounded, heavy-tailed, ordinal, or contaminated by outliers.

Typical examples: Mann-Whitney U, Kruskal-Wallis, Chi-Square, Spearman.
Risk reality: PD, LGD, recovery, utilisation, and delinquency data are often skewed, bounded, spiky, or discrete. In many validation settings, non-parametric confirmation is not optional — it is just good practice.

Which test should I use?

The cleanest way to choose a test is to ask a small number of structural questions: how many groups, what type of variable, and how strong the assumptions look.

01
How many groups or categories?
2 groups → t-test / Mann-Whitney
3+ groups → ANOVA / Kruskal-Wallis
Association table → Chi-Square
02
What type of variable?
Continuous → mean/rank based tests
Categorical → count/frequency based tests
03
Are parametric assumptions plausible?
Roughly yes → parametric route
No / unsure → non-parametric route
04
Independent or paired?
Independent → independent t / MW-U
Paired → paired t / Wilcoxon
Variable screening example: suppose you compare DTI between defaulted and non-defaulted borrowers. Two groups + continuous variable. If shape looks acceptable, try an independent t-test. If DTI is heavily skewed, confirm with Mann-Whitney.

A useful way to learn the family of tests

01

Start with the question, not the formula

Every test is answering a structural question: different means, different distributions, or association between categories.

02

Then inspect assumptions

Normality, equal variances, and sample size decide whether parametric tools are justified or whether rank-based alternatives are safer.

03

Then separate significance from effect size

A tiny p-value with a negligible effect is not an important predictor. Validation work should report both.

04

Then think about business relevance

A statistically detectable difference is not automatically useful. Ask whether the result changes segmentation, ranking, or modelling decisions.

Independent samples t-test

The t-test asks whether two groups have meaningfully different means relative to the variability inside those groups.

One-way ANOVA

ANOVA asks whether at least one group mean differs from the others. It does not tell you which one — only whether the group structure matters.

Chi-Square test of independence

Chi-Square is for categorical association. It asks whether the observed contingency table is too far from what independence would imply.

Mann-Whitney U

Mann-Whitney is the rank-based alternative to the two-sample t-test. It is useful when distributions are skewed, heavy-tailed, bounded, or clearly non-Normal.

Test selection cheat sheet

Scenario Parametric Non-parametric Data type Validation use
2 groupsIndependent t-testMann-Whitney UContinuousDefault vs non-default comparison
Paired comparisonPaired t-testWilcoxon signed-rankContinuousBefore vs after comparison on same sample
3+ groupsOne-way ANOVAKruskal-WallisContinuousGrades, segments, collateral types
Categorical associationChi-SquareCategoricalCategorical predictor vs default
Small 2×2 tableFisher's exactCategoricalRare-event contingency tables
Linear associationPearsonSpearmanContinuous / rankedPredictor relationships, dependence checks
Distribution equalityKS testContinuousScore distribution comparison

Concepts every validator should keep

assumptions

Test the assumptions before the test

Normality, equal variances, and cell counts are not side details. They determine whether the chosen test is even interpretable.

effect size

P-value is not enough

Report effect sizes such as Cohen's d, η², Cramér's V, or rank-biserial correlation. Significance without materiality is weak validation evidence.

multiple testing

False positives multiply fast

If many predictors are screened, some will appear significant by chance. Statistical filtering should be combined with domain logic.

post-hoc logic

ANOVA is only the gatekeeper

ANOVA tells you that differences exist, not where they sit. For graded systems, post-hoc testing matters.

robustness

Non-parametric does not mean weaker thinking

It often means more realistic thinking when the variable is skewed, bounded, or structurally messy.

validation style

Use tests as evidence, not as authority

Good validation combines significance, effect size, stability, model context, and business meaning. No single test is the whole answer.

What to leave this page with

Parametric tests are efficient when their assumptions are credible. Non-parametric tests are safer when the data is messy, skewed, or structurally non-Normal.

The right workflow is: define the question, inspect assumptions, choose the test family, then interpret significance together with effect size.

Once that habit is in place, test selection stops being memorisation and becomes structured judgement.