KDE - Kernel Density Estimation

Calculates the kernel density estimation (KDE) of the sample data.

Syntax

KDE(X, target, h, kernel)

X
is the input data series (one/two-dimensional array of cells (e.g., rows or columns)).
target
is the target value to compute the underlying CDF for.
h
is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, the KDE function calculates an optimal value.
kernel
is a switch to select the kernel function (1 = Gaussian (default), 2 = Uniform, 3 = Triangular, 4 = Biweight(Quatric), 5 = Triweight, 6 = Epanechnikov).
Value Kernel
1 Gaussian kernel function (default).
2 Uniform kernel.
3 Triangular kernel.
4 Biweight or Quatric kernel.
5 Triweight kernel.
6 Epanechnikov kernel.

 Warning

KDE() function is deprecated as of version 1.68: use the NxKDE function instead.

Remarks

  1. In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
  2. Let $\{x_i\}$ be an independent identical distributed (i.i.d.) sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:
    $$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N K(\frac{x-x_i}{h})$$
    Where:
    • $K()$ is the kernel function - a symmetric (but not necessarily positive) function that integrates to one.
    • $h$ is a smoothing parameter called the bandwidth.
  3. The bandwidth of the kernel is a free parameter that exhibits a strong influence on the resulting estimate.
  4. If a Gaussian-based kernel is used, and the underlying density being estimated is Gaussian, then it can be shown that the optimal choice of bandwidth ($h$) is:
    $$h_{opt}=\hat\sigma\times \sqrt [5]{\frac{4}{3N}}\approx \frac{1.06\sigma}{\sqrt [5] N} $$ $$\hat\sigma=min(s,\frac{IQR}{1.34})$$
    Where:
    • $s$ is the sample standard deviation.
    This approximation is termed the normal distribution approximation, Gaussian approximation, or Silverman's rule of thumb.
  5. KDE function uses Silverman's rule of thumb to estimate the optimal bandwidth.
  6. KDE is not assuming that the underlying probability density function (PDF) is normal; rather KDE is selecting $h$ which would be optimal if the PDF was normal.
  7. KDE currently supports a fixed bandwidth throughout the sample.
  8. The input data series may include missing values (e.g., #N/A, #VALUE!, #NUM!, empty cell), but they will not be included in the calculations.

Files Examples

Related Links

References

  • Balakrishnan, N., Exponential Distribution: Theory, Methods and Applications, CRC, P 18 1996.

Comments

Article is closed for comments.

Was this article helpful?
2 out of 2 found this helpful