Explain
Intuition
In classical (Frequentist) machine learning, making a prediction is simple: you find one "best" set of weights , plug in your new data point , and spit out a single number .
However, the Bayesian framework acknowledges that we are never 100% sure what the true weights are. We have a whole distribution of possible weights (the posterior ).
So, to make a mathematically rigorous prediction, we must ask every single possible model what it thinks the prediction should be, and then take a vote, weighted by how likely each model is. This is what the integral does.
Two Types of Uncertainty
The beauty of the final formula is that it explicitly breaks down our uncertainty about the future into two separate chunks:
- Epistemic Uncertainty (): This is the uncertainty we have because we lack knowledge or data.
- Notice that depends on (our posterior uncertainty about the weights) and .
- If you ask the model to predict a point that is very similar to the training data, this variance will be small.
- If you ask the model to predict a point wildly far away from any training data, the models will disagree wildly, and will skyrocket. This is the model saying, "I don't know, I haven't seen anything like this before!" As we gather more data, this uncertainty shrinks.
- Aleatoric Uncertainty (): This is the inherent noise in the universe. Even if we had an infinite amount of training data and knew the "true" line perfectly (), the actual observed value will still bounce around that line due to random noise . We can never get rid of this variance, no matter how much data we collect.