NxKDE - Calculates the Kernel Density Estimate

Mohamad

January 19, 2013 05:56

Calculates the Kernel Density Estimation (KDE) of the sample data.

Syntax

NxKDE(X, Lo, Hi, Transform, $\lambda$, Kernel, H, Optimization, Return, Target)

The KDE function syntax has the following arguments:

X

is the input data series (one or two-dimensional array of cells (e.g., rows or columns)).

Lo

is the x-domain lower bound. If missing, no lower bound is assumed ($-\infty$).

Hi

is the x-domain upper bound. If missing, no lower bound is assumed ($\infty$).

Transform

is a switch to select the data prior-transform method (0 = none (default), 1 = logit, 2 = probit, 3 = complementary log-log, 4 = log, 5 = power).

Value	Method
0	None / reflective (Silverman).
1	Logit transform.
2	Probit (aka., normit) transform.
3	Complementary log-log transform.
4	Log transform.
5	Power (i.e., Box-Cox) transform.

$\lambda$

is the power transform smoothing parameter.

Kernel

is a switch to select the kernel function (0 = Gaussian (default), 1 = uniform, 2 = triangular, 3 = biweight (quartic), 4 = triweight, 5 = Epanechnikov, 6 = cosine).

Value	Description
0	Gaussian kernel function (default).
1	Uniform kernel.
2	Triangular kernel.
3	Biweight or quartic kernel.
4	Triweight kernel.
5	Epanechnikov kernel.
6	Cosine kernel.

H

is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, and optimization is not “None,” the KDE function calculates an optimal value.

Optimization

is a switch to select the kernel bandwidth optimization method (0 = none (default), 1 = Silverman, 2 = direct plug-in, 3 = unbiased cross-validation).

Value	Method
0	None (default).
1	Silverman's rule of thumb.
2	Direct Plug-in (Sheather & Jones).
3	Unbiased cross-validation.

Return

is a number that determines the type of return value: 0 (or missing) = PDF, 1 = CDF, 2 = inverse CDF, 3 = bandwidth.

Value	Description
0 or omitted	Probability Density Function (PDF).
1	Cumulative Density Function (CDF).
2	Inverse Cumulative Density Function (inv. CDF).
3	Bandwidth.

Target

is the desired x-value(s) to calculate for (a single value or a one-dimensional array of cells (e.g., rows or columns)).

Remarks

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
Let $\{x_i\}$ be an independent identical distributed (i.i.d.) sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:
$$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N {K(\frac{x-x_i}{h}})$$ Where:
- $K()$ is the kernel function – a symmetric (but not necessarily positive) function that integrates to one.
- $h$ is the smoothing parameter called the bandwidth.
The bandwidth of the kernel is a free parameter that exhibits a strong influence on the resulting estimate.
The domain lower and upper bounds arguments are optional, but if they are given, the input data is checked against those bounds. An error #NUM! Is returned if any data point violates the bounds.
The power and log transform can work on one bound, while the rest can work on two bounds.
In lower and upper bounds are specified, but the transform function is either log or power, the NxKDE(.) returns #VALUE!
The none/reflection method does not transform the input data but rather treats the x-values near the domain endpoints.
The NxKDE(.) returns zero PDF for any x-value outside the specified x-domain.
The NxKDE(.) returns zero (0) CDF for any x-value smaller than the x-domain lower bound and one (1) for those values is greater than the x-domain upper bound.
For the inverse CDF return type, the NxKDE returns #VALUE! If the target value is not in $(0, 1)$ interval.
NxKDE supports a fixed bandwidth throughout the sample.
The input data series may include missing values (e.g., #N/A, #VALUE!, #NUM!, or empty cell). The KDE(.) will exclude all those values in the calculations.
The NxKDE(.) supports three bandwidth optimization methods. Except for the direct plug-in (DPI) method, the user can use any supported kernel function.
The direct plug-in (DPI) method requires a kernel function with at least six (6) non-zero derivatives, continuous and square-integrable. This excludes uniform, triangular, biweight, and Epanechnikov kernels.
The NxKDE(.) returns #VALUE! if the DPI optimization is turned on and one of the following kernels is selected: uniform, triangular, quartic, or Epanechnikov.
For performance reasons, we recommend calculating the optimal bandwidth (optimization on) in a separate step. After that, use the computed optimal bandwidth in all subsequent NxKDE(.) calls, but with the optimization off.

Status

The NxKDE(.) function is available starting with version 1.68 CAMEL.

Files Examples

References

Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman ∓ Hall/CRC. p. 45. ISBN 978-0-412-24620-3.
Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation." Journal of the American Statistical Association. 91 (433): 401–407.
Sheather, S. J., and Jones, M.C. 1991. A reliable data-based bandwidth selection method for kernel density estimation. Journal of Royal Statistical Society, Series B 53: 683–690.
W. Zucchini, Applied smoothing techniques, Part 1 Kernel Density Estimation., 2003.

NxKDE - Calculates the Kernel Density Estimate

Syntax

Remarks

Files Examples

Related Links

References

Comments

Syntax

Remarks

Files Examples

Related Links

References

Related articles

Comments