Kernel Density Estimation (KDE) in Excel

Mohamad

July 18, 2013 15:44

In this tutorial, we'll continue trying to infer the probability density function of a random variable, but we'll use another method, called kernel density estimation.

Video script

Scene 1:

Hello and welcome to the third tutorial in our ongoing series on empirical distribution. To watch your previous video on the histogram click the annotation or the link in the description box.

In this tutorial we'll continue trying to infer the probability density function of random variable, but we'll use another method called kernel density estimation. Kernel density estimates, or KDE are closely related to histograms, but there are far more superior as far as accuracy and continuity.

For our sample data we'll use 50 random values of the normal distribution generated with the help of the Excel analysis pack. We've already plotted the histogram for our reference. Now we're ready to construct your KDE plot.

First select the empty cell in your worksheet where you wish for the output table to be generated, then click on the descriptive statistics icon in anomic cell tab and select kernel density estimation from the drop down menu.

Scene 2:

The KDE wizard pops up. By default the output cells range is set to the current selected cell in your worksheet and the graph cells range is set to the 7 cells right of that cell.

Remember that you can include the column headings when selecting the cells ranges, the labels will be using the output tables.

Scene 3:

Once the data is selected the options and missing values tabs become enabled.

Scene 4:

In NumXL software in Excel we can compute the KDE values for different kernel functions such as, Gaussian uniform or triangular as well as their bandwidth value.

In the options tab the Gaussian kernel function is selected automatically, leave this option unchanged. Leave the optimal bandwidth option checked as well. The KDE function will use the silverman estimate for the bandwidth.

Let's set the output table size to 7 from the default of 5, and finally let's leave the overlay normal distribution checked. This option instructs the wizard to generate a second curve for the Gaussian distribution for comparison purposes.

Scene 5:

Now over in the missing values tab you can select how you want to handle missing values in your data. By default any missing value within the observations will be excluded from the analysis. This treatment is a good approach for our purposes, so let's leave it unchanged. Click OK to generate the output table.

Scene 6:

When examining the results of the KDE function it's important to note a couple of things, the values of all X's are sorted in the ascending order, and the summary statistics in the first row are computed merely to facilitate the calculation of the overlay Gaussian distribution function.

Note that the KDE curve which is blue follows the Gaussian densities orange curve very closely.

Scene 7:

That is it for now, thank you for watching

Related articles

Comments