Parametric & Non-Parametric Tests
Once hypothesis testing is clear, the next practical question is which test to use. This page is about that decision: t-test, ANOVA, Chi-Square, Mann-Whitney, and the logic for choosing between parametric and non-parametric tools.
Parametric vs non-parametric
The real distinction is not “old tests vs new tests.” It is “tests that rely on stronger assumptions” versus “tests that sacrifice power in exchange for robustness.”
Parametric tests
Parametric tests assume a particular structure for the data, usually some combination of Normality, equal variances, continuity, and independence.
When those assumptions are acceptable, they are usually more powerful because they use more information from the data.
They are often the first choice in clean experimental data, but they become fragile if shape assumptions are badly violated.
Non-parametric tests
Non-parametric tests avoid strong distributional assumptions. They often work on ranks or frequencies instead of raw values.
They are less efficient when parametric assumptions truly hold, but safer when the data is skewed, bounded, heavy-tailed, ordinal, or contaminated by outliers.
Which test should I use?
The cleanest way to choose a test is to ask a small number of structural questions: how many groups, what type of variable, and how strong the assumptions look.
3+ groups → ANOVA / Kruskal-Wallis
Association table → Chi-Square
Categorical → count/frequency based tests
No / unsure → non-parametric route
Paired → paired t / Wilcoxon
A useful way to learn the family of tests
Start with the question, not the formula
Every test is answering a structural question: different means, different distributions, or association between categories.
Then inspect assumptions
Normality, equal variances, and sample size decide whether parametric tools are justified or whether rank-based alternatives are safer.
Then separate significance from effect size
A tiny p-value with a negligible effect is not an important predictor. Validation work should report both.
Then think about business relevance
A statistically detectable difference is not automatically useful. Ask whether the result changes segmentation, ranking, or modelling decisions.
Independent samples t-test
The t-test asks whether two groups have meaningfully different means relative to the variability inside those groups.
One-way ANOVA
ANOVA asks whether at least one group mean differs from the others. It does not tell you which one — only whether the group structure matters.
Chi-Square test of independence
Chi-Square is for categorical association. It asks whether the observed contingency table is too far from what independence would imply.
Mann-Whitney U
Mann-Whitney is the rank-based alternative to the two-sample t-test. It is useful when distributions are skewed, heavy-tailed, bounded, or clearly non-Normal.
Test selection cheat sheet
| Scenario | Parametric | Non-parametric | Data type | Validation use |
|---|---|---|---|---|
| 2 groups | Independent t-test | Mann-Whitney U | Continuous | Default vs non-default comparison |
| Paired comparison | Paired t-test | Wilcoxon signed-rank | Continuous | Before vs after comparison on same sample |
| 3+ groups | One-way ANOVA | Kruskal-Wallis | Continuous | Grades, segments, collateral types |
| Categorical association | — | Chi-Square | Categorical | Categorical predictor vs default |
| Small 2×2 table | — | Fisher's exact | Categorical | Rare-event contingency tables |
| Linear association | Pearson | Spearman | Continuous / ranked | Predictor relationships, dependence checks |
| Distribution equality | — | KS test | Continuous | Score distribution comparison |
Concepts every validator should keep
Test the assumptions before the test
Normality, equal variances, and cell counts are not side details. They determine whether the chosen test is even interpretable.
P-value is not enough
Report effect sizes such as Cohen's d, η², Cramér's V, or rank-biserial correlation. Significance without materiality is weak validation evidence.
False positives multiply fast
If many predictors are screened, some will appear significant by chance. Statistical filtering should be combined with domain logic.
ANOVA is only the gatekeeper
ANOVA tells you that differences exist, not where they sit. For graded systems, post-hoc testing matters.
Non-parametric does not mean weaker thinking
It often means more realistic thinking when the variable is skewed, bounded, or structurally messy.
Use tests as evidence, not as authority
Good validation combines significance, effect size, stability, model context, and business meaning. No single test is the whole answer.
What to leave this page with
Parametric tests are efficient when their assumptions are credible. Non-parametric tests are safer when the data is messy, skewed, or structurally non-Normal.
The right workflow is: define the question, inspect assumptions, choose the test family, then interpret significance together with effect size.
Once that habit is in place, test selection stops being memorisation and becomes structured judgement.