Calculates the Kernel Density Estimation (KDE) of the sample data.

## Syntax

**NxKDE**(**X**, Lo, Hi, Transform, $\lambda$, Kernel, H, Optimization, Return, **Target**)

The KDE function syntax has the following arguments:

**X**- is the input data series (one or two-dimensional array of cells (e.g., rows or columns)).
**Lo**- is the x-domain lower bound. If missing, no lower bound is assumed ($-\infty$).
**Hi**- is the x-domain upper bound. If missing, no lower bound is assumed ($\infty$).
**Transform**- is a switch to select the data prior-transform method (0 = none (default), 1 = logit, 2 = probit, 3 = complementary log-log, 4 = log, 5 = power).
Value Method 0 None / reflective (Silverman). 1 Logit transform. 2 Probit (aka., normit) transform. 3 Complementary log-log transform. 4 Log transform. 5 Power (i.e., Box-Cox) transform. **$\lambda$**- is the power transform smoothing parameter.
**Kernel**- is a switch to select the kernel function (0 = Gaussian (default), 1 = uniform, 2 = triangular, 3 = biweight (quartic), 4 = triweight, 5 = Epanechnikov, 6 = cosine).
Value Description 0 Gaussian kernel function (default). 1 Uniform kernel. 2 Triangular kernel. 3 Biweight or quartic kernel. 4 Triweight kernel. 5 Epanechnikov kernel. 6 Cosine kernel. - H
- is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, and optimization is not “None,” the KDE function calculates an optimal value.
- Optimization
- is a switch to select the kernel bandwidth optimization method (0 = none (default), 1 = Silverman, 2 = direct plug-in, 3 = unbiased cross-validation).
Value Method 0 None (default). 1 Silverman's rule of thumb. 2 Direct Plug-in (Sheather & Jones). 3 Unbiased cross-validation. - Return
- is a number that determines the type of return value: 0 (or missing) = PDF, 1 = CDF, 2 = inverse CDF, 3 = bandwidth.
Value Description 0 or omitted Probability Density Function (PDF). 1 Cumulative Density Function (CDF). 2 Inverse Cumulative Density Function (inv. CDF). 3 Bandwidth. - Target
- is the desired x-value(s) to calculate for (a single value or a one-dimensional array of cells (e.g., rows or columns)).

## Remarks

- In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
- Let $\{x_i\}$ be an independent identical distributed (i.i.d.) sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:

$$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N {K(\frac{x-x_i}{h}})$$ Where:

- $K()$ is the kernel function – a symmetric (but not necessarily positive) function that integrates to one.
- $h$ is the smoothing parameter called the bandwidth.

- The bandwidth of the kernel is a free parameter that exhibits a strong influence on the resulting estimate.
- The domain lower and upper bounds arguments are optional, but if they are given, the input data is checked against those bounds. An error #NUM! Is returned if any data point violates the bounds.
- The power and log transform can work on one bound, while the rest can work on two bounds.
- In lower and upper bounds are specified, but the transform function is either log or power, the NxKDE(.) returns #VALUE!
- The none/reflection method does not transform the input data but rather treats the x-values near the domain endpoints.
- The NxKDE(.) returns zero PDF for any x-value outside the specified x-domain.
- The NxKDE(.) returns zero (0) CDF for any x-value smaller than the x-domain lower bound and one (1) for those values is greater than the x-domain upper bound.
- For the inverse CDF return type, the NxKDE returns #VALUE! If the target value is not in $(0, 1)$ interval.
- NxKDE supports a fixed bandwidth throughout the sample.
- The input data series may include missing values (e.g., #N/A, #VALUE!, #NUM!, or empty cell). The KDE(.) will exclude all those values in the calculations.
- The NxKDE(.) supports three bandwidth optimization methods. Except for the direct plug-in (DPI) method, the user can use any supported kernel function.
- The direct plug-in (DPI) method requires a kernel function with at least six (6) non-zero derivatives, continuous and square-integrable. This excludes uniform, triangular, biweight, and Epanechnikov kernels.
- The NxKDE(.) returns #VALUE! if the DPI optimization is turned on and one of the following kernels is selected: uniform, triangular, quartic, or Epanechnikov.
- For performance reasons, we recommend calculating the optimal bandwidth (optimization on) in a separate step. After that, use the computed optimal bandwidth in all subsequent NxKDE(.) calls, but with the optimization off.

* * **Status**

The NxKDE(.) function is available starting with version 1.68 CAMEL.

## Files Examples

## Related Links

## References

- Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72.
- Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman ∓ Hall/CRC. p. 45. ISBN 978-0-412-24620-3.
- Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation." Journal of the American Statistical Association. 91 (433): 401–407.
- Sheather, S. J., and Jones, M.C. 1991. A reliable data-based bandwidth selection method for kernel density estimation. Journal of Royal Statistical Society, Series B 53: 683–690.
- W. Zucchini, Applied smoothing techniques, Part 1 Kernel Density Estimation., 2003.

## Comments

Please sign in to leave a comment.