In this tutorial, we will discuss the empirical distribution function, or EDF, and then use NumXL to carry out our analysis.
Hello and welcome to the second tutorial in our ongoing series in empirical distribution. To watch your previous video on the histogram click the annotation or the link in the description box.
In this tutorial we'll discuss the empirical distribution function or EGF and then use NumXL to carry out our analysis.
The empirical distribution function estimates the true underlying cumulative density function of the points in the sample. For our example we'll use a data set of 29 randomly generated values from the Gaussian distribution.
Before we get going let's organize our input data. We'll place the values of the sample data in a separate column. Note that the sample may contain one or more missing values.
Now we're ready to construct our EDF plot. Select an empty cell in your worksheet where you wish for the output table to be generated, then find the descriptive statistics icon in the NumXL tab and click on the empirical distribution function option from the drop down menu.
The EDF wizard pops up. By default the output cells range is set to the current selected cell in her worksheet and the graph cells range is set to the 7 cells right of that cell.
You can include the column headings when selecting the cells ranges. The labels will be used in the output tables.
Once the data is selected the options of missing values tabs become enabled. Let's check out the options tab.
The overlay normal distribution is automatically selected. This option instructs the wizard to generate a second curve for the Gaussian distribution for comparison purposes, leave this feature checked.
In the missing values tab you can select how you want to handle missing values in your data. By default any missing value in the data will be excluded from the analysis. This treatment is a good approach for our purposes so let's leave it unchanged.
Now click OK to generate the output tables.
When examining the results of the EDF function it's important to note a couple of things, the function sorts all of the values of the observations in an ascending order in column D.
Also the X bar and y bar columns carry no special statistical meaning they are merely computed to assist us in generating a stepwise type of graph in Excel.
Finally the equivalent cumulative density function or CDF of the normal distribution is computed in the second column.
That is it for now, thank you for watching!