Problem 1.8 Product of Multivariate Gaussian Distributions - Answer
Prerequisite Knowledge
To solve this problem, you need to be familiar with the following concepts:
Multivariate Gaussian Definition :
N ( x ∣ μ , Σ ) = 1 ( 2 π ) d / 2 ∣ Σ ∣ 1 / 2 exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) \mathcal{N}(x|\mu, \Sigma) = \frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)\right) N ( x ∣ μ , Σ ) = ( 2 π ) d /2 ∣Σ ∣ 1/2 1 exp ( − 2 1 ( x − μ ) T Σ − 1 ( x − μ ) )
Completing the Square : As derived in Problem 1.10 , a quadratic form x T A x − 2 x T b x^T A x - 2x^T b x T A x − 2 x T b can be completed as ( x − A − 1 b ) T A ( x − A − 1 b ) − b T A − 1 b (x - A^{-1}b)^T A (x - A^{-1}b) - b^T A^{-1} b ( x − A − 1 b ) T A ( x − A − 1 b ) − b T A − 1 b .
Matrix Identities :
( A − 1 + B − 1 ) − 1 = A ( A + B ) − 1 B = B ( A + B ) − 1 A (A^{-1} + B^{-1})^{-1} = A(A+B)^{-1}B = B(A+B)^{-1}A ( A − 1 + B − 1 ) − 1 = A ( A + B ) − 1 B = B ( A + B ) − 1 A (useful for determinant manipulation).
Identity for quadratic forms:
a T A − 1 a + b T B − 1 b − ( A − 1 a + B − 1 b ) T ( A − 1 + B − 1 ) − 1 ( A − 1 a + B − 1 b ) = ( a − b ) T ( A + B ) − 1 ( a − b ) a^T A^{-1} a + b^T B^{-1} b - (A^{-1}a + B^{-1}b)^T (A^{-1} + B^{-1})^{-1} (A^{-1}a + B^{-1}b) = (a-b)^T (A+B)^{-1} (a-b) a T A − 1 a + b T B − 1 b − ( A − 1 a + B − 1 b ) T ( A − 1 + B − 1 ) − 1 ( A − 1 a + B − 1 b ) = ( a − b ) T ( A + B ) − 1 ( a − b ) .
Step-by-Step Derivation
We evaluate the product:
P ( x ) = N ( x ∣ a , A ) N ( x ∣ b , B ) P(x) = \mathcal{N}(x|a, A)\mathcal{N}(x|b, B) P ( x ) = N ( x ∣ a , A ) N ( x ∣ b , B )
Step 1: Expand the Exponents
Ignoring the normalization constants for a moment, let's look at the exponent term E E E (where the total exponent is − 1 2 E -\frac{1}{2}E − 2 1 E ):
E = ( x − a ) T A − 1 ( x − a ) + ( x − b ) T B − 1 ( x − b ) E = (x-a)^T A^{-1} (x-a) + (x-b)^T B^{-1} (x-b) E = ( x − a ) T A − 1 ( x − a ) + ( x − b ) T B − 1 ( x − b )
Expanding these quadratics:
E = ( x T A − 1 x − 2 x T A − 1 a + a T A − 1 a ) + ( x T B − 1 x − 2 x T B − 1 b + b T B − 1 b ) E = (x^T A^{-1} x - 2x^T A^{-1} a + a^T A^{-1} a) + (x^T B^{-1} x - 2x^T B^{-1} b + b^T B^{-1} b) E = ( x T A − 1 x − 2 x T A − 1 a + a T A − 1 a ) + ( x T B − 1 x − 2 x T B − 1 b + b T B − 1 b )
Group terms by powers of x x x :
E = x T ( A − 1 + B − 1 ) x − 2 x T ( A − 1 a + B − 1 b ) + ( a T A − 1 a + b T B − 1 b ) E = x^T (A^{-1} + B^{-1}) x - 2x^T (A^{-1} a + B^{-1} b) + (a^T A^{-1} a + b^T B^{-1} b) E = x T ( A − 1 + B − 1 ) x − 2 x T ( A − 1 a + B − 1 b ) + ( a T A − 1 a + b T B − 1 b )
Step 2: Define New Parameters c c c and C C C
We want to match the form of a Gaussian exponent: ( x − c ) T C − 1 ( x − c ) (x-c)^T C^{-1} (x-c) ( x − c ) T C − 1 ( x − c ) .
Looking at the quadratic term in x x x , we identify the precision matrix equal to the sum of the precisions:
C − 1 = A − 1 + B − 1 ⟹ C = ( A − 1 + B − 1 ) − 1 C^{-1} = A^{-1} + B^{-1} \implies C = (A^{-1} + B^{-1})^{-1} C − 1 = A − 1 + B − 1 ⟹ C = ( A − 1 + B − 1 ) − 1
This matches equation (1.25).
Looking at the linear term − 2 x T ( A − 1 a + B − 1 b ) -2x^T (A^{-1} a + B^{-1} b) − 2 x T ( A − 1 a + B − 1 b ) , we equate it to − 2 x T C − 1 c -2x^T C^{-1} c − 2 x T C − 1 c from the expansion of ( x − c ) T C − 1 ( x − c ) (x-c)^T C^{-1} (x-c) ( x − c ) T C − 1 ( x − c ) .
C − 1 c = A − 1 a + B − 1 b C^{-1} c = A^{-1} a + B^{-1} b C − 1 c = A − 1 a + B − 1 b
Multiplying by C C C from the left:
c = C ( A − 1 a + B − 1 b ) c = C(A^{-1} a + B^{-1} b) c = C ( A − 1 a + B − 1 b )
This matches equation (1.24).
Step 3: Complete the Square
Using the result from Problem 1.10, we can rewrite the terms involving x x x in E E E :
x T C − 1 x − 2 x T ( C − 1 c ) = ( x − c ) T C − 1 ( x − c ) − c T C − 1 c x^T C^{-1} x - 2x^T (C^{-1}c) = (x-c)^T C^{-1} (x-c) - c^T C^{-1} c x T C − 1 x − 2 x T ( C − 1 c ) = ( x − c ) T C − 1 ( x − c ) − c T C − 1 c
Substitute this back into the expression for E E E :
E = ( x − c ) T C − 1 ( x − c ) − c T C − 1 c + ( a T A − 1 a + b T B − 1 b ) E = (x-c)^T C^{-1} (x-c) - c^T C^{-1} c + (a^T A^{-1} a + b^T B^{-1} b) E = ( x − c ) T C − 1 ( x − c ) − c T C − 1 c + ( a T A − 1 a + b T B − 1 b )
Let R R R be the residual scalar term:
R = a T A − 1 a + b T B − 1 b − c T C − 1 c R = a^T A^{-1} a + b^T B^{-1} b - c^T C^{-1} c R = a T A − 1 a + b T B − 1 b − c T C − 1 c
So the product P ( x ) P(x) P ( x ) can be written as:
P ( x ) = 1 Z ∗ n o r m exp ( − 1 2 ( x − c ) T C − 1 ( x − c ) ) exp ( − 1 2 R ) P(x) = \frac{1}{Z*{norm}} \exp\left( -\frac{1}{2} (x-c)^T C^{-1} (x-c) \right) \exp\left( -\frac{1}{2} R \right) P ( x ) = Z ∗ n or m 1 exp ( − 2 1 ( x − c ) T C − 1 ( x − c ) ) exp ( − 2 1 R )
where Z ∗ n o r m = ( 2 π ) d / 2 ∣ A ∣ 1 / 2 ( 2 π ) d / 2 ∣ B ∣ 1 / 2 = ( 2 π ) d ∣ A ∣ 1 / 2 ∣ B ∣ 1 / 2 Z*{norm} = (2\pi)^{d/2}|A|^{1/2} (2\pi)^{d/2}|B|^{1/2} = (2\pi)^{d} |A|^{1/2} |B|^{1/2} Z ∗ n or m = ( 2 π ) d /2 ∣ A ∣ 1/2 ( 2 π ) d /2 ∣ B ∣ 1/2 = ( 2 π ) d ∣ A ∣ 1/2 ∣ B ∣ 1/2 .
Notice that exp ( − 1 2 ( x − c ) T C − 1 ( x − c ) ) \exp\left( -\frac{1}{2} (x-c)^T C^{-1} (x-c) \right) exp ( − 2 1 ( x − c ) T C − 1 ( x − c ) ) is the unnormalized kernel of N ( x ∣ c , C ) \mathcal{N}(x|c, C) N ( x ∣ c , C ) .
Step 4: Simplify the Residual Term R R R
We need to show that R = ( a − b ) T ( A + B ) − 1 ( a − b ) R = (a-b)^T (A+B)^{-1} (a-b) R = ( a − b ) T ( A + B ) − 1 ( a − b ) .
Substitute c = C ( A − 1 a + B − 1 b ) c = C(A^{-1} a + B^{-1} b) c = C ( A − 1 a + B − 1 b ) and C − 1 = A − 1 + B − 1 C^{-1} = A^{-1} + B^{-1} C − 1 = A − 1 + B − 1 back into c T C − 1 c c^T C^{-1} c c T C − 1 c :
c T C − 1 c = ( A − 1 a + B − 1 b ) T C T C − 1 C ( A − 1 a + B − 1 b ) c^T C^{-1} c = (A^{-1} a + B^{-1} b)^T C^T C^{-1} C (A^{-1} a + B^{-1} b) c T C − 1 c = ( A − 1 a + B − 1 b ) T C T C − 1 C ( A − 1 a + B − 1 b )
= ( A − 1 a + B − 1 b ) T C ( A − 1 a + B − 1 b ) = (A^{-1} a + B^{-1} b)^T C (A^{-1} a + B^{-1} b) = ( A − 1 a + B − 1 b ) T C ( A − 1 a + B − 1 b )
Thus,
R = a T A − 1 a + b T B − 1 b − ( A − 1 a + B − 1 b ) T ( A − 1 + B − 1 ) − 1 ( A − 1 a + B − 1 b ) R = a^T A^{-1} a + b^T B^{-1} b - (A^{-1} a + B^{-1} b)^T (A^{-1} + B^{-1})^{-1} (A^{-1} a + B^{-1} b) R = a T A − 1 a + b T B − 1 b − ( A − 1 a + B − 1 b ) T ( A − 1 + B − 1 ) − 1 ( A − 1 a + B − 1 b )
Using the matrix identity (proven via Woodbury or elementary algebra) for completing the square in the exponent:
x T A − 1 x + y T B − 1 y − ( A − 1 x + B − 1 y ) T ( A − 1 + B − 1 ) − 1 ( A − 1 x + B − 1 y ) = ( x − y ) T ( A + B ) − 1 ( x − y ) x^T A^{-1} x + y^T B^{-1} y - (A^{-1}x + B^{-1}y)^T (A^{-1} + B^{-1})^{-1} (A^{-1}x + B^{-1}y) = (x-y)^T (A+B)^{-1} (x-y) x T A − 1 x + y T B − 1 y − ( A − 1 x + B − 1 y ) T ( A − 1 + B − 1 ) − 1 ( A − 1 x + B − 1 y ) = ( x − y ) T ( A + B ) − 1 ( x − y )
Substituting x = a , y = b x=a, y=b x = a , y = b :
R = ( a − b ) T ( A + B ) − 1 ( a − b ) R = (a-b)^T (A+B)^{-1} (a-b) R = ( a − b ) T ( A + B ) − 1 ( a − b )
This term exactly matches the exponent of N ( a ∣ b , A + B ) \mathcal{N}(a|b, A+B) N ( a ∣ b , A + B ) .
Step 5: Determine the Scaling Factor Z Z Z
We have:
N ( x ∣ a , A ) N ( x ∣ b , B ) = N ( x ∣ c , C ) ⋅ ( 2 π ) d / 2 ∣ C ∣ 1 / 2 ( 2 π ) d ∣ A ∣ 1 / 2 ∣ B ∣ 1 / 2 exp ( − 1 2 R ) ⏟ _ Z \mathcal{N}(x|a, A)\mathcal{N}(x|b, B) = \mathcal{N}(x|c, C) \cdot \underbrace{\frac{(2\pi)^{d/2}|C|^{1/2}}{(2\pi)^{d}|A|^{1/2}|B|^{1/2}} \exp\left(-\frac{1}{2}R\right)}\_{Z} N ( x ∣ a , A ) N ( x ∣ b , B ) = N ( x ∣ c , C ) ⋅ ( 2 π ) d ∣ A ∣ 1/2 ∣ B ∣ 1/2 ( 2 π ) d /2 ∣ C ∣ 1/2 exp ( − 2 1 R ) _ Z
We identified exp ( − 1 2 R ) \exp(-\frac{1}{2}R) exp ( − 2 1 R ) matches the exponential part of N ( a ∣ b , A + B ) \mathcal{N}(a|b, A+B) N ( a ∣ b , A + B ) .
Now check the determinant pre-factor.
Prefactor = ∣ C ∣ 1 / 2 ( 2 π ) d / 2 ∣ A ∣ 1 / 2 ∣ B ∣ 1 / 2 \text{Prefactor} = \frac{|C|^{1/2}}{(2\pi)^{d/2} |A|^{1/2} |B|^{1/2}} Prefactor = ( 2 π ) d /2 ∣ A ∣ 1/2 ∣ B ∣ 1/2 ∣ C ∣ 1/2
We want this to match the constant of N ( a ∣ b , A + B ) \mathcal{N}(a|b, A+B) N ( a ∣ b , A + B ) , which is 1 ( 2 π ) d / 2 ∣ A + B ∣ 1 / 2 \frac{1}{(2\pi)^{d/2} |A+B|^{1/2}} ( 2 π ) d /2 ∣ A + B ∣ 1/2 1 .
We need to check if:
∣ C ∣ 1 / 2 ∣ A ∣ 1 / 2 ∣ B ∣ 1 / 2 = 1 ∣ A + B ∣ 1 / 2 \frac{|C|^{1/2}}{|A|^{1/2} |B|^{1/2}} = \frac{1}{|A+B|^{1/2}} ∣ A ∣ 1/2 ∣ B ∣ 1/2 ∣ C ∣ 1/2 = ∣ A + B ∣ 1/2 1
Squaring both sides:
∣ C ∣ ∣ A ∣ ∣ B ∣ = 1 ∣ A + B ∣ ⟺ ∣ C ∣ ∣ A + B ∣ = ∣ A ∣ ∣ B ∣ \frac{|C|}{|A||B|} = \frac{1}{|A+B|} \iff |C||A+B| = |A||B| ∣ A ∣∣ B ∣ ∣ C ∣ = ∣ A + B ∣ 1 ⟺ ∣ C ∣∣ A + B ∣ = ∣ A ∣∣ B ∣
Recalling C = ( A − 1 + B − 1 ) − 1 C = (A^{-1} + B^{-1})^{-1} C = ( A − 1 + B − 1 ) − 1 .
Using ∣ X Y ∣ = ∣ X ∣ ∣ Y ∣ |X Y| = |X| |Y| ∣ X Y ∣ = ∣ X ∣∣ Y ∣ and ∣ X − 1 ∣ = 1 / ∣ X ∣ |X^{-1}| = 1/|X| ∣ X − 1 ∣ = 1/∣ X ∣ :
∣ C ∣ = ∣ ( A − 1 + B − 1 ) − 1 ∣ = 1 ∣ A − 1 + B − 1 ∣ = 1 ∣ A − 1 ( A + B ) B − 1 ∣ |C| = |(A^{-1} + B^{-1})^{-1}| = \frac{1}{|A^{-1} + B^{-1}|} = \frac{1}{|A^{-1}(A+B)B^{-1}|} ∣ C ∣ = ∣ ( A − 1 + B − 1 ) − 1 ∣ = ∣ A − 1 + B − 1 ∣ 1 = ∣ A − 1 ( A + B ) B − 1 ∣ 1
= 1 ∣ A − 1 ∣ ∣ A + B ∣ ∣ B − 1 ∣ = ∣ A ∣ ∣ B ∣ ∣ A + B ∣ = \frac{1}{|A^{-1}| |A+B| |B^{-1}|} = \frac{|A||B|}{|A+B|} = ∣ A − 1 ∣∣ A + B ∣∣ B − 1 ∣ 1 = ∣ A + B ∣ ∣ A ∣∣ B ∣
Thus, the determinants match.
Conclusion
Combining the residual exponent and the determinant factors, we get:
Z = 1 ( 2 π ) d / 2 ∣ A + B ∣ 1 / 2 exp ( − 1 2 ( a − b ) T ( A + B ) − 1 ( a − b ) ) = N ( a ∣ b , A + B ) Z = \frac{1}{(2\pi)^{d/2}|A+B|^{1/2}} \exp\left( -\frac{1}{2} (a-b)^T (A+B)^{-1} (a-b) \right) = \mathcal{N}(a|b, A+B) Z = ( 2 π ) d /2 ∣ A + B ∣ 1/2 1 exp ( − 2 1 ( a − b ) T ( A + B ) − 1 ( a − b ) ) = N ( a ∣ b , A + B )
So,
N ( x ∣ a , A ) N ( x ∣ b , B ) = Z N ( x ∣ c , C ) \mathcal{N}(x|a, A)\mathcal{N}(x|b, B) = Z \mathcal{N}(x|c, C) N ( x ∣ a , A ) N ( x ∣ b , B ) = Z N ( x ∣ c , C )
Q.E.D.