Module 2 - Descriptive Statistics

In module 1, we examined the MSFT daily closing prices time series to explore common issues associated with real-time series data: stationarity, homogeneity, outliers, missing values, and concentration of values. We concluded the module with a transformation of the input data using logarithmic transformation.

In this module, we will conduct a few computations to summarize the sample statistical distribution in an attempt to understand the unknown population distribution.

NumXL comes with scores of functions to compute various summary statistics functions, including robust functions like Quantile, IQR, etc. Furthermore, NumXL includes a wide range of statistical tests to verify the significance of the computed summary statistics.

Illustration

For our purposes here, we’ll use the NumXL toolbar and summary statistics wizards. Using the NumXL tab, click DESC STAT.

Invoking the summary statistics wizard in NumXL 1.63 Toolbar.

Next, the Descriptive Statistics dialog box will pop up. Fill in the fields with your (log-transformed) data location, series time order, options, and location for the results to appear on your worksheet.

Descriptive Statistics Dialog or Wizard. Select the cells range for your input data, and select the different stats available (on the LHS in the red square), then click OK.

The Descriptive Statistics dialog box will print out the selected statistics and tests (along with the formulas) into your worksheet.

Output table generated by descriptive statistics wizard using the daily log-price for Microsoft stock between Jan, 2000 and Jan 2009.

In sum, one may conclude that the underlying distribution has the following properties:

  • Mean is significantly different from zero
  • Density (mass) distribution is significantly positively skewed
  • Density distribution has fat-tails
  • Half of the observation values fall between 3.09 and 3.32

Although the median is smaller than average, the distribution is positively skewed, leading us to believe that the distribution has the right fat-tails.

The quartiles (Q1, Q3) inscribe 50% of the values in the sample. The inter-quartile range (IQR) can be used to characterize the data when there may be extremities that skew the data; the interquartile range is a relatively robust statistic (also sometimes called "resistant") compared to the range and standard deviation.

We’ll revisit the statistical tests for white-noise, normality, and arch effect in module 5, but for now, we can ignore the right-most table.

Recall the MSFT log-price histogram in module 1 (see below):

A histogram plot showing the relative frequency for different values in MSFT log-price time.


Tutorial Video


Support Files

Comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful