Estimación de Densidad Kernel (KDE o EDK) en Excel

Mohamad

18 de julio de 2013 15:44

En este tutorial continuaremos tratando de inferir la función de probabilidad de densidad de una variable aleatoria, pero usaremos otro método llamado estimación de densidad de Kernel.

Guion de Video

Scene 1:

Hello and welcome to the third tutorial in our ongoing series on empirical distribution. To watch your previous video on the histogram click the annotation or the link in the description box.

In this tutorial, we'll continue trying to infer the probability density function of random variables, but we'll use another method called kernel density estimation. Kernel density estimates or KDE are closely related to histograms, but they are far more superior as far as accuracy and continuity.

For our sample data, we'll use 50 random values of the normal distribution generated with the help of the Excel analysis pack. We've already plotted the histogram for our reference. Now we're ready to construct your KDE plot.

First, select the empty cell in your worksheet where you wish for the output table to be generated, then click on the descriptive statistics icon in the anomic cell tab and select kernel density estimation from the drop-down menu.

Scene 2:

The KDE wizard pops up. By default, the output cells range is set to the currently selected cell in your worksheet and the graph cells range is set to the 7 cells right of that cell.

Remember that you can include the column headings when selecting the cells ranges, the labels will be using the output tables.

Scene 3:

Once the data is selected the options and missing values tabs become enabled.

Scene 4:

In NumXL software, in Excel, we can compute the KDE values for different kernel functions such as Gaussian uniform or triangular as well as their bandwidth value.

In the options tab, the Gaussian kernel function is selected automatically, leave this option unchanged. Leave the optimal bandwidth option checked as well. The KDE function will use the Silverman estimate for the bandwidth.

Let's set the output table size to 7 from the default of 5, and finally let's leave the overlay normal distribution checked. This option instructs the wizard to generate a second curve for the Gaussian distribution for comparison purposes.

Scene 5:

Now over in the missing values tab, you can select how you want to handle missing values in your data. By default, any missing value within the observations will be excluded from the analysis. This treatment is a good approach for our purposes, so let's leave it unchanged. Click OK to generate the output table.

Scene 6:

When examining the results of the KDE function it's important to note a couple of things, the values of all X's are sorted in ascending order, and the summary statistics in the first row are computed merely to facilitate the calculation of the overlay Gaussian distribution function.

Note that the KDE curve which is blue follows the Gaussian densities orange curve very closely.

Scene 7:

That is it for now, thank you for watching

For more information on kernel density estimation, visit our KDE tutorial

Artículos relacionados

Comentarios