Calculates the value of the k-nearest neighbors (K-nn) regression.
Syntax
NxKNN(X, Y, target, Method, extrapolate)
- X
- is the x-component of the input data table (a one dimensional array of cells (e.g. rows or columns)).
- Y
- is the y-component (i.e. function) of the input data table (a one dimensional array of cells (e.g. rows or columns)).
- K
- is the number of nearest neighboring data-points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One(1).
- Method
- is the variant of the K-nn algorithm: 0= Original (default), 1= Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, basic K-NN method is assumed.
Value Method 0 Original (equal-weight to all K-nn points) 1 Weighted by the inverse of their distance from target (aka. query) value 2 Weigthed by variable-bandwidth kernel (e.g. Gaussian). - Kernel
- is the weighting kernel function used with KNN-Regression method : 0(or missing)=Uniform, 1=Triangular, 2=Epanechnikov, 3=Quartic, 4=Triweight, 5=Tricube, 6=Gaussian, 7=Cosine, 8=Logistic, 9= Sigmoid, 10= Silverman.
Value Method 0 Uniform Kernel (default) 1 Triangular Kernel 2 Epanechnikov Kernel 3 Quartic Kernel 4 Triweight Kernel 5 Tricube Kernel 6 Gaussian Kernel 7 Cosine Kernel 8 Logistic Kernel 9 Sigmoid Kernel 10 Silverman Kernel - Optimize
- is a flag (True/False) for searching and using optimal integer value K (i.e. number of data-points). If missing or omitted, optimize is assumed False.
- target
- is the desired x-value(s) to interpolate for (a single value or a one dimensional array of cells (e.g. rows or columns)).
- Return
- is a number that determines the type of return value: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).
Return Description 0 Forecast/Regression value(s) (default) 1 Forecast/Regression error(s) 2 K Parameter 3 RMSE (cross-validation)
Remarks
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- Observations (i.e. rows) with missing values in X or Y are removed.
- The K is a positive integer greater less than the time series size, or else an error value (#VALUE!) is returned.
- the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
- In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
- The best choice of k depends upon the data; generally, larger values of k reduces effect of the noise on the classification.
- A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
- The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
- The KNN predictions are based on the intuitive assumption that objects close in distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
- By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
- For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
- $x$ is the value of the query data-point.
- $\{p_i\}$ are set of k-nearest neighbors data-points.
- $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data-point and the i-th neighboring data-point.
- Similarly, we can use kernel weighting functions to discriminate between neighboring data-points. The kernel band-width is variable, which is calculated for each query data-point based on k-nearest neighbors data-points.
- For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
- The NxKNN() function is available starting with version 1.66 PARSON.
Files Examples
References
- Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
- Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
- Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.
Comments
Please sign in to leave a comment.