Calculates the value of the k-nearest neighbors (K-nn) regression.

## Syntax

**NxKNN**(**X**, **Y**, **target**, **Method**, **extrapolate**)

- X
- is the x-component of the input data table (a one-dimensional array of cells (e.g. rows or columns)).
- Y
- is the y-component (i.e. function) of the input data table (a one-dimensional array of cells (e.g. rows or columns)).
- K
- is the number of nearest neighboring data points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One(1).
- Method
- is the variant of the K-nn algorithm: 0 = Original (default), 1 = Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, the basic K-NN method is assumed.
Value Method 0 Original (equal-weight to all K-nn points). 1 Weighted by the inverse of their distance from the target (aka. query) value. 2 Weigthed by the variable-bandwidth kernel (e.g. Gaussian). - Kernel
- is the weighting kernel function used with KNN-Regression method : 0(or missing) = Uniform, 1 = Triangular, 2 = Epanechnikov, 3 = Quartic, 4 = Triweight, 5 = Tricube, 6 = Gaussian, 7 = Cosine, 8 = Logistic, 9 = Sigmoid, 10 = Silverman.
Value Method 0 Uniform Kernel (default). 1 Triangular Kernel. 2 Epanechnikov Kernel. 3 Quartic Kernel. 4 Triweight Kernel. 5 Tricube Kernel. 6 Gaussian Kernel. 7 Cosine Kernel. 8 Logistic Kernel. 9 Sigmoid Kernel. 10 Silverman Kernel. - Optimize
- is a flag (True/False) for searching and using optimal integer value K (i.e. number of data points). If missing or omitted, optimize is assumed False.
- target
- is the desired x-value(s) to interpolate for (a single value or a one-dimensional array of cells (e.g. rows or columns)).
- Return
- is a number that determines the type of return value: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).
Return Description 0 Forecast/Regression value(s) (default). 1 Forecast/Regression error(s). 2 K Parameter. 3 RMSE (cross-validation).

## Remarks

- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- Observations (i.e. rows) with missing values in X or Y are removed.
- The K is a positive integer greater or less than the time series size, or else an error value (#VALUE!) is returned.
- the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
- In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
- The best choice of k depends upon the data; generally, larger values of k reduce the effect of the noise on the classification.
- A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
- The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
- The KNN predictions are based on the intuitive assumption that objects close in distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
- By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
- For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
- $x$ is the value of the query data point.
- $\{p_i\}$ are set of k-nearest neighbors data-points.
- $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data point and the i-th neighboring data point.

- Similarly, we can use kernel weighting functions to discriminate between neighboring data points. The kernel bandwidth is variable, which is calculated for each query data point based on k-nearest neighbors data points.
- For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
- The NxKNN() function is available starting with version 1.66 PARSON.

## Files Examples

## Related Links

## References

- Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
- Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
- Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.

## Comments

Please sign in to leave a comment.