NormalityTest - Testing for Normal/Gaussian Distribution

Mohamad

October 25, 2016 17:55

Returns the p-value of the normality test (i.e., whether a data set is well-modeled by a normal distribution).

Syntax

NormalityTest(X, Method, Return_type, $\alpha$)

X

is the input data sample (one/two-dimensional array of cells (e.g., rows or columns)).

Method

the statistical test to perform (1 = Jarque-Bera, 2 = Shapiro-Wilk, 3 = Chi-Square (Doornik and Hansen)).

Return_type

is a switch to select the return output (1 = P-Value (default), 2 = Test Stats, 3 = Critical Value.

$\alpha$

is the statistical significance of the test (i.e., alpha). If missing or omitted, an alpha value of 5% is assumed.

The sample data may include missing values (e.g., a time series as a result of a lag or difference operator).
The Shapiro-Wilk test This test is best suited to samples of less than 5000 observations.
The Jarque-Bera test This test is more powerful the higher the number of values.
The test hypothesis for the data is from a normal distribution: $$H_{o}: x \sim N(.)$$ $$H_{1}: x \neq N(.)$$ Where:
- $H_{o}$ is the null hypothesis.
- $H_{1}$ is the alternate hypothesis.
- $N(.)$ is the normal probability distribution function.
The Jarque-Bera test is a goodness-of-fit measure of departure from normality based on the sample kurtosis and skewness. The test is named after Carlos M. Jarque and Anil K. Bera. The test statistic JB is defined as: $$\mathit{JB} = \frac{n}{6} \left( S^2 + \frac{K^2}{4} \right)$$ Where:
- $S$ is the sample skewness.
- $K$ is the sample excess kurtosis.
- $n$ is the number of non-missing values in the data sample.
The Jarque-Bera $\mathit{JB}$ statistic has an asymptotic chi-square distribution with two degrees of freedom and can be used to test the null hypothesis that the data is from a normal distribution. $$\mathit{JB} \sim \chi_{\nu=2}^2()$$ Where:
- $\chi_{\nu}^2()$ is the Chi-square probability distribution function.
- $\nu$ is the degrees of freedom for the Chi-square distribution.
This is one-side (i.e., one-tail) test, so the computed p-value should be compared with the whole significance level ($\alpha$).

Jarque, Carlos M.; Anil K. Bera (1980). "Efficient tests for normality, homoscedasticity and serial independence of regression residuals". Economics Letters 6 (3): 255-259.
Ljung, G. M. and Box, G. E. P., "On a measure of lack of fit in time series models." Biometrika 65 (1978): 297-303.
Enders, W., "Applied econometric time series", John Wiley & Sons, 1995, p. 86-87.
Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)", Biometrika, 52, 3 and 4, pages 591-611.