Calculates the value of the k-nearest neighbors (K-nn) regression.

## Syntax

**NxKNN**(

**X**,

**Y**,

**target**,

**Method**,

**extrapolate**)

**X** is the x-component of the input data table (a one dimensional array of cells (e.g. rows or columns)).

**Y** is the y-component (i.e. function) of the input data table (a one dimensional array of cells (e.g. rows or columns)).

**K** is the number of nearest neighboring data-points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One(1).

**Method** is the variant of the K-nn algorithm: 0= Original (default), 1= Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, basic K-NN method is assumed.

Value | Method |
---|---|

0 | Original (equal-weight to all K-nn points) |

1 | Weighted by the inverse of their distance from target (aka. query) value |

2 | Weigthed by variable-bandwidth kernel (e.g. Gaussian). |

**Kernel** is the weighting kernel function used with KNN-Regression method : 0(or missing)=Uniform, 1=Triangular, 2=Epanechnikov, 3=Quartic, 4=Triweight, 5=Tricube, 6=Gaussian, 7=Cosine, 8=Logistic, 9= Sigmoid, 10= Silverman.

Value | Method |
---|---|

0 | Uniform Kernel (default) |

1 | Triangular Kernel |

2 | Epanechnikov Kernel |

3 | Quartic Kernel |

4 | Triweight Kernel |

5 | Tricube Kernel |

6 | Gaussian Kernel |

7 | Cosine Kernel |

8 | Logistic Kernel |

9 | Sigmoid Kernel |

10 | Silverman Kernel |

**Optimize** is a flag (True/False) for searching and using optimal integer value K (i.e. number of data-points). If missing or omitted, optimize is assumed False.

**target** is the desired x-value(s) to interpolate for (a single value or a one dimensional array of cells (e.g. rows or columns)).

**Return** is a number that determines the type of return value: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).

Return | Description |
---|---|

0 | Forecast/Regression value(s) (default) |

1 | Forecast/Regression error(s) |

2 | K Parameter |

3 | RMSE (cross-validation) |

## Remarks

- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- Observations (i.e. rows) with missing values in X or Y are removed.
- The K is a positive integer greater less than the time series size, or else an error value (#VALUE!) is returned.
- the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
- In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
- The best choice of k depends upon the data; generally, larger values of k reduces effect of the noise on the classification.
- A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
- The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
- The KNN predictions are based on the intuitive assumption that objects close in distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
- By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
- For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
- $x$ is the value of the query data-point.
- $\{p_i\}$ are set of k-nearest neighbors data-points.
- $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data-point and the i-th neighboring data-point.

- Similarly, we can use kernel weighting functions to discriminate between neighboring data-points. The kernel band-width is variable, which is calculated for each query data-point based on k-nearest neighbors data-points.
- For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
- The NxKNN() function is available starting with version 1.66 PARSON.

## Files Examples

## References

- Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
- Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
- Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.

## 0 Comments