-
Define the Conditional Risk:
The conditional risk R(x) for a given input x and a decision function g(x) is the expected loss over the conditional distribution p(y∣x):
R(x)=Ey∣x[L(g(x),y)]=∫L(g(x),y)p(y∣x)dy
-
Substitute the Squared-Loss Function:
Given L(g(x),y)=(g(x)−y)2, we substitute this into the risk equation:
R(x)=∫(g(x)−y)2p(y∣x)dy
-
Minimize the Conditional Risk:
To find the optimal decision function g∗(x) that minimizes R(x), we take the derivative of R(x) with respect to g(x) and set it to zero.
∂g(x)∂R(x)=∂g(x)∂∫(g(x)−y)2p(y∣x)dy
Assuming we can interchange differentiation and integration (Leibniz integral rule):
∂g(x)∂R(x)=∫∂g(x)∂(g(x)−y)2p(y∣x)dy
∂g(x)∂R(x)=∫2(g(x)−y)p(y∣x)dy
-
Set the Derivative to Zero:
∫2(g(x)−y)p(y∣x)dy=0
2∫g(x)p(y∣x)dy−2∫yp(y∣x)dy=0
Since g(x) does not depend on y, we can pull it out of the first integral:
g(x)∫p(y∣x)dy=∫yp(y∣x)dy
-
Solve for g(x):
We know that the integral of a probability density function over its entire domain is 1, so ∫p(y∣x)dy=1.
g(x)⋅1=∫yp(y∣x)dy
The right side is the definition of the conditional expected value of y given x.
g∗(x)=E[y∣x]
Thus, the Bayes Decision Rule for the squared-loss function is to choose the conditional mean.