NxKNN - K Nearest neighbors (K-NN ) Regression

Calculates the value of the k-nearest neighbors (K-nn) regression.

 

Syntax

NxKNN(X, Y, target, Method, extrapolate)

X is the x-component of the input data table (a one dimensional array of cells (e.g. rows or columns)).

Y is the y-component (i.e. function) of the input data table (a one dimensional array of cells (e.g. rows or columns)).

K is the number of nearest neighboring data-points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One(1).

Method is the variant of the K-nn algorithm: 0= Original (default), 1= Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, basic K-NN method is assumed.

Value Method
0 Original (equal-weight to all K-nn points)
1 Weighted by the inverse of their distance from target (aka. query) value
2 Weigthed by variable-bandwidth kernel (e.g. Gaussian).

Kernel is the weighting kernel function used with KNN-Regression method : 0(or missing)=Uniform, 1=Triangular, 2=Epanechnikov, 3=Quartic, 4=Triweight, 5=Tricube, 6=Gaussian, 7=Cosine, 8=Logistic, 9= Sigmoid, 10= Silverman.

Value Method
0 Uniform Kernel (default)
1 Triangular Kernel
2 Epanechnikov Kernel
3 Quartic Kernel
4 Triweight Kernel
5 Tricube Kernel
6 Gaussian Kernel
7 Cosine Kernel
8 Logistic Kernel
9 Sigmoid Kernel
10 Silverman Kernel

Optimize is a flag (True/False) for searching and using optimal integer value K (i.e. number of data-points). If missing or omitted, optimize is assumed False.

target is the desired x-value(s) to interpolate for (a single value or a one dimensional array of cells (e.g. rows or columns)).

Return is a number that determines the type of return value: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).

Return Description
0 Forecast/Regression value(s) (default)
1 Forecast/Regression error(s)
2 K Parameter
3 RMSE (cross-validation)
 

Remarks

  1. The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
  2. Observations (i.e. rows) with missing values in X or Y are removed.
  3. The K is a positive integer greater less than the time series size, or else an error value (#VALUE!) is returned.
  4. the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
  5. In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
  6. The best choice of k depends upon the data; generally, larger values of k reduces effect of the noise on the classification.
  7. A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
  8. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
  9. The KNN predictions are based on the intuitive assumption that objects close in distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
  10. By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
  11. For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
    • $x$ is the value of the query data-point.
    • $\{p_i\}$ are set of k-nearest neighbors data-points.
    • $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data-point and the i-th neighboring data-point.
  12. Similarly, we can use kernel weighting functions to discriminate between neighboring data-points. The kernel band-width is variable, which is calculated for each query data-point based on k-nearest neighbors data-points.
  13. For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
  14. The NxKNN() function is available starting with version 1.66 PARSON.

Files Examples

References

  • Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
  • Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
  • Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.
Have more questions? Submit a request

0 Comments