Skip to main content

Problem 1.6 Multivariate Gaussian - Answer

(a) Diagonal Covariance Matrix

Pre-required Knowledge

  • Determinant of a diagonal matrix: The determinant of a diagonal matrix is the product of its diagonal elements.
  • Inverse of a diagonal matrix: The inverse of a diagonal matrix is a diagonal matrix with the reciprocals of the original diagonal elements.
  • Exponential rules: ea+b=eaebe^{a+b} = e^a e^b and exp(xi)=exp(xi)\exp(\sum x_i) = \prod \exp(x_i).
  • Univariate Gaussian PDF: N(xiμi,σi2)=12πσi2e(xiμi)22σi2\mathcal{N}(x_i|\mu_i, \sigma_i^2) = \frac{1}{\sqrt{2\pi\sigma_i^2}} e^{-\frac{(x_i-\mu_i)^2}{2\sigma_i^2}}.

Step-by-step Answer

  1. Analyze the Determinant term: Given the diagonal covariance matrix Σ=diag(σ12,,σd2)\Sigma = \text{diag}(\sigma_1^2, \dots, \sigma_d^2), the determinant is:

    Σ=i=1dσi2|\Sigma| = \prod_{i=1}^d \sigma_i^2

    Taking the square root:

    Σ1/2=(i=1dσi2)1/2=i=1dσi|\Sigma|^{1/2} = \left(\prod_{i=1}^d \sigma_i^2\right)^{1/2} = \prod_{i=1}^d \sigma_i
  2. Analyze the Mahalanobis distance term: The inverse of the diagonal matrix Σ\Sigma is:

    Σ1=diag(1σ12,,1σd2)\Sigma^{-1} = \text{diag}\left(\frac{1}{\sigma_1^2}, \dots, \frac{1}{\sigma_d^2}\right)

    Expanding the inner term xμΣ2=(xμ)TΣ1(xμ)\|x - \mu\|^2_{\Sigma} = (x - \mu)^T \Sigma^{-1} (x - \mu):

    (xμ)TΣ1(xμ)=[x1μ1xdμd][1σ12001σd2][x1μ1xdμd]=i=1d(xiμi)2σi2\begin{aligned} (x - \mu)^T \Sigma^{-1} (x - \mu) &= \begin{bmatrix} x_1 - \mu_1 & \cdots & x_d - \mu_d \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_1^2} & & 0 \\ & \ddots & \\ 0 & & \frac{1}{\sigma_d^2} \end{bmatrix} \begin{bmatrix} x_1 - \mu_1 \\ \vdots \\ x_d - \mu_d \end{bmatrix} \\ &= \sum_{i=1}^d \frac{(x_i - \mu_i)^2}{\sigma_i^2} \end{aligned}
  3. Substitute back into the full PDF:

    p(x)=1(2π)d/2i=1dσiexp(12i=1d(xiμi)2σi2)p(x) = \frac{1}{(2\pi)^{d/2} \prod_{i=1}^d \sigma_i} \exp\left( -\frac{1}{2} \sum_{i=1}^d \frac{(x_i - \mu_i)^2}{\sigma_i^2} \right)
  4. Factorize the expression: Distribute the (2π)d/2(2\pi)^{d/2} as i=1d(2π)1/2\prod_{i=1}^d (2\pi)^{1/2} and separate the exponential sum into a product:

    p(x)=(i=1d1(2π)1/2σi)i=1dexp((xiμi)22σi2)=i=1d(12πσie(xiμi)22σi2)=i=1dN(xiμi,σi2)\begin{aligned} p(x) &= \left( \prod_{i=1}^d \frac{1}{(2\pi)^{1/2}\sigma_i} \right) \prod_{i=1}^d \exp\left( -\frac{(x_i - \mu_i)^2}{2\sigma_i^2} \right) \\ &= \prod_{i=1}^d \left( \frac{1}{\sqrt{2\pi}\sigma_i} e^{-\frac{(x_i-\mu_i)^2}{2\sigma_i^2}} \right) \\ &= \prod_{i=1}^d \mathcal{N}(x_i|\mu_i, \sigma_i^2) \end{aligned}

    This shows that the joint density is the product of independent univariate marginal densities.


(b) 2D Gaussian with Diagonal Covariance

Pre-required Knowledge

  • Contour plots: Visualizing 3D functions on 2D planes using isolines.
  • Equation of an ellipse: x2a2+y2b2=1\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1.

Step-by-step Answer

Given μ=[00]\mu = \begin{bmatrix} 0 \\ 0 \end{bmatrix} and Σ=[1000.25]\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 0.25 \end{bmatrix}.

  1. Variances: σ12=1    σ1=1\sigma_1^2 = 1 \implies \sigma_1 = 1. σ22=0.25    σ2=0.5\sigma_2^2 = 0.25 \implies \sigma_2 = 0.5.

  2. Mahalanobis Distance:

    xμΣ2=x121+x220.25=x12+4x22\|x - \mu\|^2_{\Sigma} = \frac{x_1^2}{1} + \frac{x_2^2}{0.25} = x_1^2 + 4x_2^2

    The level sets (contours) where density is constant satisfy x12+4x22=Cx_1^2 + 4x_2^2 = C. This is the equation of an ellipse centered at (0,0)(0,0).

  3. Shape Description:

    • The ellipse has a semi-major axis along x1x_1 (length proportional to σ1=1\sigma_1 = 1).
    • The semi-minor axis is along x2x_2 (length proportional to σ2=0.5\sigma_2 = 0.5).
    • Effect: The density is "axis-aligned". Because σ1>σ2\sigma_1 > \sigma_2, the distribution is stretched along the x1x_1 axis and compressed along the x2x_2 axis. It looks like a flattened oval.

(c) Isotropic Covariance Matrix

Pre-required Knowledge

  • Isotropic: Having a physical property which has the same value when measured in different directions.
  • Identity Matrix: A matrix with 1s on the diagonal and 0s elsewhere.

Step-by-step Answer

Given Σ=[1001]\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

  1. Variances: σ12=1,σ22=1\sigma_1^2 = 1, \sigma_2^2 = 1.

  2. Mahalanobis Distance:

    xμΣ2=x12+x22\|x - \mu\|^2_{\Sigma} = x_1^2 + x_2^2

    The contours satisfy x12+x22=Cx_1^2 + x_2^2 = C, which is the equation of a circle.

  3. Shape Description: The contours are perfect circles. The probability density falls off at the same rate in every direction from the center. The spread is symmetric (spherical in higher dimensions).


(d) Eigendecomposition of Covariance Matrix

Pre-required Knowledge

  • Spectral Theorem: Any symmetric real matrix can be diagonalized by an orthogonal matrix of its eigenvectors.
  • Matrix Multiplication: Associative properties.

Step-by-step Answer

  1. Eigenvalue Equation: We are given Σvi=λivi\Sigma v_i = \lambda_i v_i for i=1,,di=1,\dots,d.

  2. Matrix Form: We can stack the vectors viv_i into a matrix V=[v1,,vd]V = [v_1, \dots, v_d] and the scalars into a diagonal matrix Λ=diag(λ1,,λd)\Lambda = \text{diag}(\lambda_1, \dots, \lambda_d). The set of equations Σvi=λivi\Sigma v_i = \lambda_i v_i becomes:

    ΣV=VΛ\Sigma V = V \Lambda
  3. Orthogonality: Since Σ\Sigma is a symmetric matrix (covariance matrices are symmetric), its eigenvectors viv_i can be chosen to be orthonormal (mutually orthogonal and unit length). Therefore, VV is an orthogonal matrix, meaning VTV=IV^T V = I or V1=VTV^{-1} = V^T.

  4. Diagonalization: Right-multiply the equation ΣV=VΛ\Sigma V = V \Lambda by VTV^T:

    ΣVVT=VΛVT\Sigma V V^T = V \Lambda V^T

    Since VVT=IV V^T = I:

    Σ=VΛVT\Sigma = V \Lambda V^T

(e) Tranformation to Diagonal Space

Pre-required Knowledge

  • Inverse of decomposed matrix: (ABC)1=C1B1A1(ABC)^{-1} = C^{-1} B^{-1} A^{-1}.
  • Vector properties: (Ax)T=xTAT(Ax)^T = x^T A^T.

Step-by-step Answer

  1. Inverse of Σ\Sigma: Using the decomposition from (d):

    Σ1=(VΛVT)1=(VT)1Λ1V1\Sigma^{-1} = (V \Lambda V^T)^{-1} = (V^T)^{-1} \Lambda^{-1} V^{-1}

    Since VV is orthogonal (V1=VTV^{-1} = V^T and (VT)1=V(V^T)^{-1} = V):

    Σ1=VΛ1VT\Sigma^{-1} = V \Lambda^{-1} V^T
  2. Substitute into Mahalanobis Distance:

    xμΣ2=(xμ)TVΛ1VT(xμ)\|x - \mu\|^2_{\Sigma} = (x - \mu)^T V \Lambda^{-1} V^T (x - \mu)
  3. Define yy: Let y=VT(xμ)y = V^T (x - \mu). Then its transpose is yT=(xμ)TVy^T = (x - \mu)^T V. Substitute yy into the distance equation:

    xμΣ2=yTΛ1y\|x - \mu\|^2_{\Sigma} = y^T \Lambda^{-1} y
  4. Result: Since Λ\Lambda is diagonal, yTΛ1y=yΛ2y^T \Lambda^{-1} y = \|y\|^2_{\Lambda}. This shows that in the coordinate system defined by yy, the variables are uncorrelated (diagonal covariance Λ\Lambda).


(f) Geometric Effect of V and μ\mu

Pre-required Knowledge

  • Change of Basis: Projecting a vector onto new basis vectors.
  • Affine Transformation: Linear transformation followed by translation.

Step-by-step Answer

The relationship is x=Vy+μx = Vy + \mu (derived from y=VT(xμ)y = V^T(x-\mu) by multiplying by VV and adding μ\mu).

  1. Effect of transformation VV (Rotation): VV contains the eigenvectors of Σ\Sigma. Multiplying a vector yy by an orthogonal matrix VV performs a rotation (or reflection) of the coordinate system. Specifically, the standard axes in yy-space (where the Gaussian is axis-aligned) are rotated to align with the eigenvectors viv_i in xx-space.

  2. Effect of transformation μ\mu (Translation): Adding μ\mu shifts the origin. The center of the distribution moves from 00 (in yy-space if we consider centered yy) to μ\mu in xx-space.

  3. Summary: To generate sample xx, you take a sample yy from an axis-aligned Gaussian, rotate it by VV, and translate it by μ\mu.


(g) General Covariance Matrix Plot

Pre-required Knowledge

  • Characteristic Equation: det(ΣλI)=0\det(\Sigma - \lambda I) = 0 to find eigenvalues.
  • Eigenvectors: Solving (ΣλI)v=0(\Sigma - \lambda I)v = 0.

Step-by-step Answer

Given Σ=[0.6250.3750.3750.625]\Sigma = \begin{bmatrix} 0.625 & 0.375 \\ 0.375 & 0.625 \end{bmatrix}.

  1. Find Eigenvalues:

    det[0.625λ0.3750.3750.625λ]=0\det \begin{bmatrix} 0.625 - \lambda & 0.375 \\ 0.375 & 0.625 - \lambda \end{bmatrix} = 0 (0.625λ)20.3752=0(0.625 - \lambda)^2 - 0.375^2 = 0 0.625λ=±0.3750.625 - \lambda = \pm 0.375 λ1=0.625+0.375=1.0\lambda_1 = 0.625 + 0.375 = 1.0 λ2=0.6250.375=0.25\lambda_2 = 0.625 - 0.375 = 0.25
  2. Find Eigenvectors:

    • For λ1=1\lambda_1 = 1: [0.3750.3750.3750.375][v11v12]=0    v11=v12\begin{bmatrix} -0.375 & 0.375 \\ 0.375 & -0.375 \end{bmatrix} \begin{bmatrix} v_{11} \\ v_{12} \end{bmatrix} = 0 \implies v_{11} = v_{12}. Normalized eigenvector: v1=[1212][0.7070.707]v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \approx \begin{bmatrix} 0.707 \\ 0.707 \end{bmatrix}. (Direction of 4545^\circ line).

    • For λ2=0.25\lambda_2 = 0.25: [0.3750.3750.3750.375][v21v22]=0    v21=v22\begin{bmatrix} 0.375 & 0.375 \\ 0.375 & 0.375 \end{bmatrix} \begin{bmatrix} v_{21} \\ v_{22} \end{bmatrix} = 0 \implies v_{21} = -v_{22}. Normalized eigenvector: v2=[1212][0.7070.707]v_2 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \approx \begin{bmatrix} -0.707 \\ 0.707 \end{bmatrix}. (Direction of 135135^\circ line).

  3. Shape Description:

    • The eigenvalues are 11 and 0.250.25, which are identical to the variances in part (b).
    • Effect of Eigenvalues: They determine the lengths of the major and minor axes of the uncertainty ellipse. (Major axis length proportional to 1\sqrt{1}, minor to 0.25\sqrt{0.25}).
    • Effect of Eigenvectors: They determine the orientation. The ellipse from part (b) is rotated by 4545^\circ counter-clockwise. The distribution is elongated along the line x1=x2x_1 = x_2.