Calculates the Kernel Density Estimation (KDE) of the sample data.
Syntax
NxKDE(X, Lo, Hi, Transform, $\lambda$, Kernel, H, Optimization, Return, Target)
The KDE function syntax has the following arguments:
- X
- is the input data series (one or two-dimensional array of cells (e.g., rows or columns)).
- Lo
- is the x-domain lower bound. If missing, no lower bound is assumed ($-\infty$).
- Hi
- is the x-domain upper bound. If missing, no lower bound is assumed ($\infty$).
- Transform
- is a switch to select the data prior-transform method (0 = none (default), 1 = logit, 2 = probit, 3 = complementary log-log, 4 = log, 5 = power).
Value Method 0 None / reflective (Silverman). 1 Logit transform. 2 Probit (aka., normit) transform. 3 Complementary log-log transform. 4 Log transform. 5 Power (i.e., Box-Cox) transform. - $\lambda$
- is the power transform smoothing parameter.
- Kernel
- is a switch to select the kernel function (0 = Gaussian (default), 1 = uniform, 2 = triangular, 3 = biweight (quartic), 4 = triweight, 5 = Epanechnikov, 6 = cosine).
Value Description 0 Gaussian kernel function (default). 1 Uniform kernel. 2 Triangular kernel. 3 Biweight or quartic kernel. 4 Triweight kernel. 5 Epanechnikov kernel. 6 Cosine kernel. - H
- is the smoothing parameter (bandwidth) of the kernel density estimator. If missing, and optimization is not “None,” the KDE function calculates an optimal value.
- Optimization
- is a switch to select the kernel bandwidth optimization method (0 = none (default), 1 = Silverman, 2 = direct plug-in, 3 = unbiased cross-validation).
Value Method 0 None (default). 1 Silverman's rule of thumb. 2 Direct Plug-in (Sheather & Jones). 3 Unbiased cross-validation. - Return
- is a number that determines the type of return value: 0 (or missing) = PDF, 1 = CDF, 2 = inverse CDF, 3 = bandwidth.
Value Description 0 or omitted Probability Density Function (PDF). 1 Cumulative Density Function (CDF). 2 Inverse Cumulative Density Function (inv. CDF). 3 Bandwidth. - Target
- is the desired x-value(s) to calculate for (a single value or a one-dimensional array of cells (e.g., rows or columns)).
Remarks
- In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
- Let $\{x_i\}$ be an independent identical distributed (i.i.d.) sample drawn from some distribution with an unknown density $f()$. The kernel density estimator is defined as follows:
$$\hat f(x)=\frac{1}{nh}\sum_{i=1}^N {K(\frac{x-x_i}{h}})$$ Where:
- $K()$ is the kernel function – a symmetric (but not necessarily positive) function that integrates to one.
- $h$ is the smoothing parameter called the bandwidth.
- The bandwidth of the kernel is a free parameter that exhibits a strong influence on the resulting estimate.
- The domain lower and upper bounds arguments are optional, but if they are given, the input data is checked against those bounds. An error #NUM! Is returned if any data point violates the bounds.
- The power and log transform can work on one bound, while the rest can work on two bounds.
- In lower and upper bounds are specified, but the transform function is either log or power, the NxKDE(.) returns #VALUE!
- The none/reflection method does not transform the input data but rather treats the x-values near the domain endpoints.
- The NxKDE(.) returns zero PDF for any x-value outside the specified x-domain.
- The NxKDE(.) returns zero (0) CDF for any x-value smaller than the x-domain lower bound and one (1) for those values is greater than the x-domain upper bound.
- For the inverse CDF return type, the NxKDE returns #VALUE! If the target value is not in $(0, 1)$ interval.
- NxKDE supports a fixed bandwidth throughout the sample.
- The input data series may include missing values (e.g., #N/A, #VALUE!, #NUM!, or empty cell). The KDE(.) will exclude all those values in the calculations.
- The NxKDE(.) supports three bandwidth optimization methods. Except for the direct plug-in (DPI) method, the user can use any supported kernel function.
- The direct plug-in (DPI) method requires a kernel function with at least six (6) non-zero derivatives, continuous and square-integrable. This excludes uniform, triangular, biweight, and Epanechnikov kernels.
- The NxKDE(.) returns #VALUE! if the DPI optimization is turned on and one of the following kernels is selected: uniform, triangular, quartic, or Epanechnikov.
- For performance reasons, we recommend calculating the optimal bandwidth (optimization on) in a separate step. After that, use the computed optimal bandwidth in all subsequent NxKDE(.) calls, but with the optimization off.
Status
The NxKDE(.) function is available starting with version 1.68 CAMEL.
Files Examples
Related Links
References
- Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72.
- Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman ∓ Hall/CRC. p. 45. ISBN 978-0-412-24620-3.
- Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation." Journal of the American Statistical Association. 91 (433): 401–407.
- Sheather, S. J., and Jones, M.C. 1991. A reliable data-based bandwidth selection method for kernel density estimation. Journal of Royal Statistical Society, Series B 53: 683–690.
- W. Zucchini, Applied smoothing techniques, Part 1 Kernel Density Estimation., 2003.
Comments
Please sign in to leave a comment.