Calculates the value of the k-nearest neighbors (K-NN) regression.
Syntax
NxKNN([x], [y], k, method, kernel, optimize, target, return)
- X
- is the x-component of the input data table (a one-dimensional array of cells (e.g., rows or columns)).
- Y
- is the y-component (i.e., function) of the input data table (a one-dimensional array of cells (e.g., rows or columns)).
- k
- is the number of nearest neighboring data points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One (1).
- method
- is the variant of the K-NN algorithm: 0= Original (default), 1= Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, basic K-NN the method is assumed.
Value Method 0 Original (equal-weight to all K-NN points). 1 Weighted by the inverse of their distance from the target (aka., query) value. 2 Weighted by variable-bandwidth kernel (e.g., Gaussian). - Kernel
- is the weighting kernel function used with KNN-Regression method: 0(or missing) =Uniform, 1=Triangular, 2=Epanechnikov, 3=Quartic, 4=Triweight, 5=Tricube, 6=Gaussian,7=Cosine, 8=Logistic, 9= Sigmoid, 10= Silverman.
Value Kernel 0 Uniform Kernel (default). 1 Triangular Kernel. 2 Epanechnikov Kernel. 3 Quartic Kernel. 4 Triweight Kernel. 5 Tricube Kernel. 6 Gaussian Kernel. 7 Cosine Kernel. 8 Logistic Kernel. 9 Sigmoid Kernel. 10 Silverman Kernel. - optimize
- is a flag (True/False) for searching and using optimal integer value K (i.e., number of data points). If missing or omitted, optimize is assumed to be False.
- target
- is the desired x-value(s) to interpolate for (a single value or a one-dimensional array of cells (e.g., rows or columns)).
- return
- is a number that determines the return value type: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).
Value Return type 0 Forecast/Regression value(s) (default). 1 Forecast/Regression error(s). 2 K Parameter. 3 RMSE (cross-validation).
Remarks
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- Observations (i.e., rows) with missing values in X or Y are removed.
- The K is a positive integer greater or less than the time series size, or else an error value (#VALUE!) is returned.
- the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
- In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
- The best choice of k depends upon the data; generally, larger values of k reduces the effect of the noise on the classification.
- A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
- The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
- The KNN predictions are based on the intuitive assumption that objects close in the distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
- By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
- For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
- $x$ is the value of the query data point.
- $\{p_i\}$ are set of k-nearest neighbors data-points.
- $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data point and the i-th neighboring data point.
- Similarly, we can use kernel weighting functions to discriminate between neighboring data points. The kernel bandwidth is variable, which is calculated for each query data point based on k-nearest neighbors data points.
- For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
- The NxKNN() function is available starting with version 1.66 PARSON.
Tutorial Video
Files Examples
Related Links
References
- Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
- Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
- Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.
Comments
Please sign in to leave a comment.