Kernel trick Guide, Meaning , Facts, Information and Description
The
kernel trick was first published in the
1964 paper
Theoretical foundations of the potential function method in pattern recognition learning.
The kernel trick uses Mercer's condition, which states that any positive semi-definite kernel K(x, y) can be expressed as a dot product in a high-dimensional space.
More specifically, if the arguments to the kernel are in a measurable space X, and if the kernel is positive semi-definite, i.e.,
for any finite sequence of
x1, ...,
xn of
X and sequence
c1, ...,
cn of real numbers; then there exists a function φ(
x) whose
range is in an
inner product space of possibly high dimension, such that
The kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. Thus, a linear algorithm can easily be transformed into a non-linear algorithm. This non-linear algorithm is the linear algorithm operating in the range space of φ. However, because kernels are used, the φ function is never explicitly computed. This is desirable, because the high-dimensional space may be infinite-dimensional (as is the case when the kernel is a
Gaussian).
The kernel trick has been applied to several algorithms in machine learning and statistics, including:
The coiner of the term
kernel trick is unknown.
References
- M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821--837, 1964.
See also
This is an Article on Kernel trick. Page Contains Information, Facts Details or Explanation Guide About Kernel trick