Warning
KDE() function is deprecated as of version 1.68: use NxKDE(.) function instead.
Calculates the kernel density estimation (KDE) of the sample data.
Syntax
KDE(X, target, h, kernel)
- X
- is the input data series (one/two dimensional array of cells (e.g. rows or columns)).
- target
- is the target value to compute the underlying cdf for.
- h
- is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, the KDE function calculates an optimal value.
- kernel
- is a switch to select the kernel function (1= Gaussian (default), 2=Uniform, 3=Triangular, 4=Biweight(Quatric), 5=Triweight, 6=Epanechnikov).
Order Description 1 Gaussian kernel function (default) 2 Uniform kernel 3 Triangular kernel 4 Biweight or Quatric kernel 5 Triweight kernel 6 Epanechnikov kernel
Remarks
- In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
- Let $\{x_i\}$ be an iid sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:
$$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N K(\frac{x-x_i}{h})$$
Where:
- $K()$ is the kernel function - a symmetric (but not necessarily positive) function that integrates to one
- $h$ is a smoothing parameter called the bandwidth
- The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate.
- If Gaussian-based kernel is used, and the underlying density being estimated is Gaussian, then it can be shown that the optimal choice of bandwidth (h) is:
$$h_{opt}=\hat\sigma\times \sqrt [5]{\frac{4}{3N}}\approx \frac{1.06\sigma}{\sqrt [5] N} $$ $$\hat\sigma=min(s,\frac{IQR}{1.34}) $$
Where:
- $s$ is the sample standard deviation.
- KDE function uses Silverman's rule of thumb to estimate the optimal bandwidth.
- KDE is not assuming that underlying probability density function (pdf) is normal; rather KDE is selecting $h$ which would be optimal if the pdf were normal.
- KDE currently supports a fixed bandwidth throughout the sample.
- The input data series may include missing values (e.g. #N/A, #VALUE!, #NUM!, empty cell), but they will not be included in the calculations.
Examples
Example 1:
|
|
Formula | Description (Result) |
---|---|
=KDE($B$2:$B$29,0.5,,1) | KDE (0.165) |
Files Examples
Related Links
References
- Balakrishnan, N., Exponential Distribution: Theory, Methods and Applications, CRC, P 18 1996.
Comments
Article is closed for comments.