Calculates the kernel density estimation (KDE) of the sample data.
Syntax
KDE(X, target, h, kernel)
- X
- is the input data series (one/two-dimensional array of cells (e.g., rows or columns)).
- target
- is the target value to compute the underlying CDF for.
- h
- is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, the KDE function calculates an optimal value.
- kernel
- is a switch to select the kernel function (1 = Gaussian (default), 2 = Uniform, 3 = Triangular, 4 = Biweight(Quatric), 5 = Triweight, 6 = Epanechnikov).
Value Kernel 1 Gaussian kernel function (default). 2 Uniform kernel. 3 Triangular kernel. 4 Biweight or Quatric kernel. 5 Triweight kernel. 6 Epanechnikov kernel.
Warning
KDE() function is deprecated as of version 1.68: use the NxKDE function instead.
Remarks
- In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
- Let $\{x_i\}$ be an independent identical distributed (i.i.d.) sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:
$$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N K(\frac{x-x_i}{h})$$
Where:
- $K()$ is the kernel function - a symmetric (but not necessarily positive) function that integrates to one.
- $h$ is a smoothing parameter called the bandwidth.
- The bandwidth of the kernel is a free parameter that exhibits a strong influence on the resulting estimate.
- If a Gaussian-based kernel is used, and the underlying density being estimated is Gaussian, then it can be shown that the optimal choice of bandwidth ($h$) is:
$$h_{opt}=\hat\sigma\times \sqrt [5]{\frac{4}{3N}}\approx \frac{1.06\sigma}{\sqrt [5] N} $$ $$\hat\sigma=min(s,\frac{IQR}{1.34})$$
Where:
- $s$ is the sample standard deviation.
- KDE function uses Silverman's rule of thumb to estimate the optimal bandwidth.
- KDE is not assuming that the underlying probability density function (PDF) is normal; rather KDE is selecting $h$ which would be optimal if the PDF was normal.
- KDE currently supports a fixed bandwidth throughout the sample.
- The input data series may include missing values (e.g., #N/A, #VALUE!, #NUM!, empty cell), but they will not be included in the calculations.
Files Examples
Related Links
References
- Balakrishnan, N., Exponential Distribution: Theory, Methods and Applications, CRC, P 18 1996.
Comments
Article is closed for comments.