# Gaussian Processes are Bayesian Linear Regression

When I was taking classes on machine learning, I learned about SVMs and the kernel trick to get non-linear SVMs. The last slide usually said something along the lines of "you can also kernelize other methods" without giving anymore hints as to which methods this refers to that could be rewritten in terms of inner products and thus kernelized. So it came as a surprise to me when I began reading Gaussian Processes for Machine Learning and learned that not only is Bayesian linear regression (BLR) one of those methods but kernelizing it gets you Gaussian Processes (GPs).1

Even after Rasmussen introduced the kernelization of BLR as the weight-view of GPs, I was still sceptical and expected to read that this is kernel-BLR and one can generalize it further to get GPs. But no, kernelized BLR and GPs are actually identical. To really convince myself of this fact, I filled in the details in the kernelization process that Rasmussen omits.




















At this point, the expression admits the same kernel formulation as the mean did before and we get



It is known that GPs are equivalent to deep neural networks in the infinite-width limit2. Speaking informally, this shows that "Bayesianization" and kernelizing can generalize linear regression into something as powerful as deep learning. Goes to show that I underestimated the increase in "power" that kernelizing a method brings.

1 The distill journal has a nice visual introduction to GPs.

2 Apparently, GPs are also equivalent to spline smoothing but that looks to be more of a theoretical result.

Want to comment or get in touch? @martenlienen or email me