# NxKNN - K Nearest Neighbors (K-NN ) Regression

Calculates the value of the k-nearest neighbors (K-nn) regression.

## Syntax

NxKNN(X, Y, target, Method, extrapolate)

X
is the x-component of the input data table (a one-dimensional array of cells (e.g. rows or columns)).
Y
is the y-component (i.e. function) of the input data table (a one-dimensional array of cells (e.g. rows or columns)).
K
is the number of nearest neighboring data points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One(1).
Method
is the variant of the K-nn algorithm: 0 = Original (default), 1 = Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, the basic K-NN method is assumed.
Value Method
0 Original (equal-weight to all K-nn points).
1 Weighted by the inverse of their distance from the target (aka. query) value.
2 Weigthed by the variable-bandwidth kernel (e.g. Gaussian).
Kernel
is the weighting kernel function used with KNN-Regression method : 0(or missing) = Uniform, 1 = Triangular, 2 = Epanechnikov, 3 = Quartic, 4 = Triweight, 5 = Tricube, 6 = Gaussian, 7 = Cosine, 8 = Logistic, 9 = Sigmoid, 10 = Silverman.
Value Method
0 Uniform Kernel (default).
1 Triangular Kernel.
2 Epanechnikov Kernel.
3 Quartic Kernel.
4 Triweight Kernel.
5 Tricube Kernel.
6 Gaussian Kernel.
7 Cosine Kernel.
8 Logistic Kernel.
9 Sigmoid Kernel.
10 Silverman Kernel.
Optimize
is a flag (True/False) for searching and using optimal integer value K (i.e. number of data points). If missing or omitted, optimize is assumed False.
target
is the desired x-value(s) to interpolate for (a single value or a one-dimensional array of cells (e.g. rows or columns)).
Return
is a number that determines the type of return value: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).
Return Description
0 Forecast/Regression value(s) (default).
1 Forecast/Regression error(s).
2 K Parameter.
3 RMSE (cross-validation).

## Remarks

1. The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
2. Observations (i.e. rows) with missing values in X or Y are removed.
3. The K is a positive integer greater or less than the time series size, or else an error value (#VALUE!) is returned.
4. the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.
5. In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
6. The best choice of k depends upon the data; generally, larger values of k reduce the effect of the noise on the classification.
7. A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
8. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
9. The KNN predictions are based on the intuitive assumption that objects close in distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
10. By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
11. For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
• $x$ is the value of the query data point.
• $\{p_i\}$ are set of k-nearest neighbors data-points.
• $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data point and the i-th neighboring data point.
12. Similarly, we can use kernel weighting functions to discriminate between neighboring data points. The kernel bandwidth is variable, which is calculated for each query data point based on k-nearest neighbors data points.
13. For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
14. The NxKNN() function is available starting with version 1.66 PARSON.