D=(X,y)={xi,yi=y(xi)}Ni=1k(x1,x2)x
m(x)=kK−1y
및 분산을
V(x)=k(x,x)−kK−1kT.
Vector
k={k(x,x1),…,k(x,xN)} is a vector of covariances, matrix
K={k(xi,xj)}Ni,j=1 is a matrix of sample covariances. In case we make prediction using mean value of posterior distribution for sample
interpolation property holds. Really,
m(X)=KK−1y=y.
But, it isn't the case if we use regularization i.e. incorporate white noise term. in this case covariance matrix for sample has form
K+σI, but for covariances with real function values we have covariance matrix
K, and posterior mean is
m(X)=K(K+σI)−1y≠y.
In addition, regularization makes problem more computationally stable.
Choosing noise variance σ we can select if we want interpolation (σ=0) or we want to handle noisy observations (σ is big).
Also, the Gaussian processes regression is local method because variance of predictions grows with distance to learning sample, but we can select appropriate covariance function k and handle more complex problems, than with RBF. Another nice property is small number of parameters. Usually it equals O(n), where n is data dimension.