KDE - Kernel Density Estimation

Calculates the kernel density estimation (KDE) of the sample data.

 

Syntax

KDE(X, target, h, kernel)

X is the input data series (one/two dimensional array of cells (e.g. rows or columns)).

target is the target value to compute the underlying cdf for.

h is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, the KDE function calculates an optimal value.

kernel is a switch to select the kernel function (1= Gaussian (default), 2=Uniform, 3=Triangular, 4=Biweight(Quatric), 5=Triweight, 6=Epanechnikov).

Order Description
1 Gaussian kernel function (default)
2 Uniform kernel
3 Triangular kernel
4 Biweight or Quatric kernel
5 Triweight kernel
6 Epanechnikov kernel
 

Remarks

  1. In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
  2. Let $\{x_i\}$ be an iid sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:

    $$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N K(\frac{x-x_i}{h})$$
    Where:
    • $K()$ is the kernel function - a symmetric (but not necessarily positive) function that integrates to one
    • $h$ is a smoothing parameter called the bandwidth
  3. The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate.
  4. If Gaussian-based kernel is used, and the underlying density being estimated is Gaussian, then it can be shown that the optimal choice of bandwidth (h) is:

    $$h_{opt}=\hat\sigma\times \sqrt [5]{\frac{4}{3N}}\approx \frac{1.06\sigma}{\sqrt [5] N} $$ $$\hat\sigma=min(s,\frac{IQR}{1.34}) $$
    Where:
    • $s$ is the sample standard deviation.
    This approximation is termed the normal distribution approximation, Gaussian approximation or Silverman's rule of thumb.
  5. KDE function uses Silverman's rule of thumb to estimate the optimal bandwidth.
  6. KDE is not assuming that underlying probability density function (pdf) is normal; rather KDE is selecting $h$ which would be optimal if the pdf were normal.
  7. KDE currently supports a fixed bandwidth throughout the sample.
  8. The input data series may include missing values (e.g. #N/A, #VALUE!, #NUM!, empty cell), but they will not be included in the calculations.

Examples

Example 1:

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
A B
Date Data
1/1/2008 #N/A
1/2/2008 -1.28
1/3/2008 0.24
1/4/2008 1.28
1/5/2008 1.20
1/6/2008 1.73
1/7/2008 -2.18
1/8/2008 -0.23
1/9/2008 1.10
1/10/2008 -1.09
1/11/2008 -0.69
1/12/2008 -1.69
1/13/2008 -1.85
1/14/2008 -0.98
1/15/2008 -0.77
1/16/2008 -0.30
1/17/2008 -1.28
1/18/2008 0.24
1/19/2008 1.28
1/20/2008 1.20
1/21/2008 1.73
1/22/2008 -2.18
1/23/2008 -0.23
1/24/2008 1.10
1/25/2008 -1.09
1/26/2008 -0.69
1/27/2008 -1.69
1/28/2008 -1.85
1/29/2008 -0.98


  Formula Description (Result)
  =KDE($B$2:$B$29,0.5,,1) KDE (0.165)

Files Examples

References

  • Balakrishnan, N., Exponential Distribution: Theory, Methods and Applications, CRC, P 18 1996.
Have more questions? Submit a request

0 Comments