Skip to content

Distribution Metrics

Metrics that directly quantify how different two probability distributions are.

Wasserstein Distance

Feature Name: wasserstein_distance

For 1D distributions: $$ W_1(P, Q) = \int_{-\infty}^{\infty} |F_P(x) - F_Q(x)| \, dx $$

For discrete samples: $$ W_1 = \frac{1}{n} \sum_{i=1}^{n} |x_{(i)} - y_{(i)}| $$

where x₍ᵢ₎ and y₍ᵢ₎ are sorted samples.

Also Known As

  • Earth Mover's Distance (EMD)
  • Kantorovich-Rubinstein metric

Properties

  • True metric (satisfies triangle inequality)
  • Sensitive to location shifts
  • Well-defined even when distributions don't overlap
  • Units match data units

Interpretation

  • W = 0: Identical distributions
  • For pure mean shift: W = |μ₁ - μ₂|

Why Useful

Wasserstein captures the "total amount of change" between distributions. Unlike KS (which only looks at maximum difference), it considers the entire shape.

Used by: meta_stacking_7models


ECDF Distance

Feature Name: ecdf_distance

\[ d_{\text{ECDF}} = \frac{1}{k} \sum_{i=1}^{k} |F_1(t_i) - F_2(t_i)| \]

where tᵢ are k evenly spaced evaluation points.

Properties

  • Computationally simpler than exact Wasserstein
  • Related to L1 distance between CDFs

Overlap Coefficient

Feature Name: overlap_coeff

\[ \text{OVL} = \int \min(f_1(x), f_2(x)) \, dx \]

For histograms: $$ \text{OVL} = \sum_i \min(h_{1i}, h_{2i}) \times \Delta x $$

Interpretation

Value Meaning
OVL = 1 Identical distributions
OVL = 0 No overlap (completely separated)
OVL = 0.5 Half overlap

Why Useful

Directly related to classification difficulty. If OVL = 0.8, about 80% of observations can't be reliably assigned to pre or post based on value alone.


Quantile Shift

Feature Name: quantile_shift

\[ QS = \frac{1}{k} \sum_{i=1}^{k} |F_{\text{post}}^{-1}(p_i) - F_{\text{pre}}^{-1}(p_i)| \]

where F⁻¹(p) is the quantile function.

Interpretation

  • QS = 0: No shift in any quantiles
  • Large QS: Substantial distributional shift
  • Related to Wasserstein distance

Used by: meta_stacking_7models