NxKNN - K Nearest Neighbors (K-NN ) Regression

Calculates the value of the k-nearest neighbors (K-NN) regression.

Syntax

NxKNN([x], [y], k, method, kernel, optimize, target, return)

X
is the x-component of the input data table (a one-dimensional array of cells (e.g., rows or columns)).
Y
is the y-component (i.e., function) of the input data table (a one-dimensional array of cells (e.g., rows or columns)).
k
is the number of nearest neighboring data points used in the k-NN algorithm. If missing or omitted, K is assumed equal to One (1).
method
is the variant of the K-NN algorithm: 0= Original (default), 1= Weighted (inverse to distance), 2 = variable-bandwidth Kernel weighting. If missing or omitted, basic K-NN the method is assumed.
Value Method
0 Original (equal-weight to all K-NN points).
1 Weighted by the inverse of their distance from the target (aka., query) value.
2 Weighted by variable-bandwidth kernel (e.g., Gaussian).
Kernel
is the weighting kernel function used with KNN-Regression method: 0(or missing) =Uniform, 1=Triangular, 2=Epanechnikov, 3=Quartic, 4=Triweight, 5=Tricube, 6=Gaussian,7=Cosine, 8=Logistic, 9= Sigmoid, 10= Silverman.
Value Kernel
0 Uniform Kernel (default).
1 Triangular Kernel.
2 Epanechnikov Kernel.
3 Quartic Kernel.
4 Triweight Kernel.
5 Tricube Kernel.
6 Gaussian Kernel.
7 Cosine Kernel.
8 Logistic Kernel.
9 Sigmoid Kernel.
10 Silverman Kernel.
optimize
is a flag (True/False) for searching and using optimal integer value K (i.e., number of data points). If missing or omitted, optimize is assumed to be False.
target
is the desired x-value(s) to interpolate for (a single value or a one-dimensional array of cells (e.g., rows or columns)).
return
is a number that determines the return value type: 0=Forecast (default), 1=errors, 2=K parameter, 3=RMSE. If missing or omitted, NxKNN returns forecast/regression value(s).
Value Return type
0 Forecast/Regression value(s) (default).
1 Forecast/Regression error(s).
2 K Parameter.
3 RMSE (cross-validation).

Remarks

  1. The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
  2. Observations (i.e., rows) with missing values in X or Y are removed.
  3. The K is a positive integer greater or less than the time series size, or else an error value (#VALUE!) is returned.
  4. the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space.
  5. In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
  6. The best choice of k depends upon the data; generally, larger values of k reduces the effect of the noise on the classification.
  7. A good k can be selected by various heuristic techniques (e,g. hyper-parameter optimization).
  8. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.
  9. The KNN predictions are based on the intuitive assumption that objects close in the distance are potentially similar, it makes good sense to discriminate between the K nearest neighbors when making predictions, i.e., let the closest points among the K nearest neighbors have more say in affecting the outcome of the query point.
  10. By introducing a set of weights W, one for each nearest neighbor, defined by the relative closeness of each neighbor with respect to the query point.
  11. For example, using $e^{-D}$ function, we can define weight as follow:: $$W(x,p_i)=\frac{e^{-D(x,p_i}}{\sum_{j=1}^K -D(x,p_j)}$$ Where:
    • $x$ is the value of the query data point.
    • $\{p_i\}$ are set of k-nearest neighbors data-points.
    • $D(x,p_i)$ is the distance measure (e.g. Euclidean) between the query data point and the i-th neighboring data point.
  12. Similarly, we can use kernel weighting functions to discriminate between neighboring data points. The kernel bandwidth is variable, which is calculated for each query data point based on k-nearest neighbors data points.
  13. For initial values, the NumXL optimizer will use the input value of (K) (if available) in the minimization problem.
  14. The NxKNN() function is available starting with version 1.66 PARSON.

Tutorial Video

Files Examples

Related Links

References

  • Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185.
  • Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics. 5 (4): 595–620.
  • Samworth R. J. (2012). "Optimal weighted nearest neighbor classifiers". Annals of Statistics. 40 (5): 2733–2763.

Comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 2 found this helpful