Non Parametric Stats

when your data don’t meet the assumptions

WARNING - LOW POWER ZONE

  • If your data meet (or approximate) assumptions of parametrics, they are generally more powerful
  • Monte-Carlo techniques are also often more powerful than non-parametrics
  • However, non-parametrics simpler to use than MC

Rank-Order Statistics

  • Non-parametrics are known as rank-order tests, because they work by ranking observations and analyzing these ranks, rather than the data themselves.
  • To use non-parametrics with continuous values, you have to discard a lot of information.
  • We will talk about non-parametrics in relation to their parametric equivalents.

Non-Parametric Regression

  • Non-parametric regression techniques exist but are not commonly used.
  • There are, however, several non-parametric correlation techniques that are widely used.

Spearman’s Rho

X and Y values are ranked separately, and the Pearson’s product-moment coefficient (\(r\)) is computed on these ranks

x <- c(0.9, 6.8, 3.2, 2.4, 1.2, 1.1)
y <- c(0.1, 4.5, 5.4, 1.5, 1.9, 4.1)

Spearman’s Rho

rank_x <- rank(x)
rank_y <- rank(y)
rank_x
## [1] 1 6 5 4 3 2
rank_y
## [1] 1 5 6 2 3 4

Spearman’s Rho

Spearman’s Rho

cor(x, y, method = "pearson")
## [1] 0.5590485
cor(x, y, method = "spearman")
## [1] 0.7142857
cor(rank_x, rank_y, method = "pearson")
## [1] 0.7142857

Kendall’s Tau

Alternative to Spearman….

  • Rank observations
  • Examine each pair of observations, determine whether they match or not
  • Compute \(\tau\)
  • \[\tau = \frac{(number\ of\ matched\ pairs) - (number\ of \ non\ matched\ pairs)}{\frac{1}{2}n(n-1)}\]
  • Note: the denominator is the total number of pairwise comparisons.

Kendall’s Tau

cor(var1, var2, method="kendall")
## [1] 0.8222222

Non-Parametric t-test

Mann-Whitney U, also known as the Wilcoxon Rank-Sum

  • rank observations, ignoring group
  • sum the ranks belonging to each group
  • calculate the test statistic
  • \[U = R - \frac{n(n+1)}{2}\]
  • \(R\) is the summed ranks, and \(n\) is the group sample size
  • do this for both groups, and take the smallest as the test statistic
  • compare to known distribution under null hypothesis

Mann-Whitney U / Wilcoxon Rank-Sum

x <- rnorm(10, mean=5)
y <- rnorm(10, mean=7)
wilcox.test(x,y)
## 
##  Wilcoxon rank sum exact test
## 
## data:  x and y
## W = 9, p-value = 0.00105
## alternative hypothesis: true location shift is not equal to 0

Non-Parametric ANOVA

Kruskal-Wallis

  • Rank all observations
  • Calculate the average rank within each group
  • Compare the average rank within group the the overall average of ranks, using a weighted sum-of-squares technique
  • Compare p value of test statistic using chi-square approximation

Kruskal-Wallis

Kruskal-Wallis

kruskal.test(var~group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  var by group
## Kruskal-Wallis chi-squared = 2.2946, df = 1, p-value = 0.1298

Goodness of Fit Test

Kolmogorov-Smirnov Test

Non-parametric test to determine whether two distributions differ

Based on theoretical vs empirical CDF

KS-Test

  • The single largest deviation of the empirical from the theoretical is the KS statistic. This is used to compute a p-value.
  • Can be used for any distribution, not just the normal distribution.

KS-Test in R

ks.test(rnorm(100), "punif")
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  rnorm(100)
## D = 0.42597, p-value = 3.469e-16
## alternative hypothesis: two-sided

KS-Test in R

ks.test(rnorm(100)^2, "pnorm")
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  rnorm(100)^2
## D = 0.50003, p-value < 2.2e-16
## alternative hypothesis: two-sided

Two Sample KS-Test in R

ks.test(runif(100), rnorm(100))
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  runif(100) and rnorm(100)
## D = 0.5, p-value = 2.778e-11
## alternative hypothesis: two-sided