Skip to main content

Problem 1.6 Multivariate Gaussian

The multivariate Gaussian is a probability density over real vectors, x=[x1xd]Rdx = \begin{bmatrix} x_1 \\ \vdots \\ x_d \end{bmatrix} \in \mathbb{R}^d, which is parameterized by a mean vector μRd\mu \in \mathbb{R}^d and a covariance matrix ΣS+d\Sigma \in \mathbb{S}^d_+ (i.e., a dd-dimensional positive-definite symmetric matrix). The density function is

p(x)=N(xμ,Σ)=1(2π)d/2Σ1/2e12xμΣ2,(1.12)p(x) = \mathcal{N}(x|\mu, \Sigma) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} e^{-\frac{1}{2} \|x-\mu\|^2_{\Sigma}}, \tag{1.12}

where Σ|\Sigma| is the determinant of Σ\Sigma, and

xμΣ2=(xμ)TΣ1(xμ)(1.13)\|x - \mu\|^2_{\Sigma} = (x - \mu)^T \Sigma^{-1} (x - \mu) \tag{1.13}

is the Mahalanobis distance. In this problem, we will look at how different covariance matrices affect the shape of the density.

First, consider the case where Σ\Sigma is a diagonal matrix, i.e., the off-diagonal entries are 0,

Σ=[σ1200σd2].(1.14)\Sigma = \begin{bmatrix} \sigma_1^2 & & 0 \\ & \ddots & \\ 0 & & \sigma_d^2 \end{bmatrix}. \tag{1.14}

(a) Show that with a diagonal covariance matrix, the multivariate Gaussian is equivalent to assuming that the elements of the vector are independent, and each is distributed as a univariate Gaussian, i.e.,

N(xμ,Σ)=i=1dN(xiμi,σi2).(1.15)\mathcal{N}(x|\mu, \Sigma) = \prod_{i=1}^d \mathcal{N}(x_i|\mu_i, \sigma_i^2). \tag{1.15}

Hint: the following properties of diagonal matrices will be useful:

Σ=i=1dσi2,Σ1=[1σ12001σd2].(1.16)|\Sigma| = \prod_{i=1}^d \sigma_i^2, \quad \Sigma^{-1} = \begin{bmatrix} \frac{1}{\sigma_1^2} & & 0 \\ & \ddots & \\ 0 & & \frac{1}{\sigma_d^2} \end{bmatrix}. \tag{1.16}

(b) Plot the Mahalanobis distance term and probability density function for a 2-dimensional Gaussian with μ=[00]\mu = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, and Σ=[1000.25]\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 0.25 \end{bmatrix}. How is the shape of the density affected by the diagonal terms?

(c) Plot the Mahalanobis distance term and pdf when the variances of each dimension are the same, e.g., μ=[00]\mu = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, and Σ=[1001]\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. This is sometimes called an i.i.d. (independently and identically distributed) covariance matrix, isotropic covariance matrix, or circular covariance matrix.

Next, we will consider the general case for the covariance matrix.

(d) Let {λi,vi}\{\lambda_i, v_i\} be the eigenvalue/eigenvector pairs of Σ\Sigma, i.e.,

Σvi=λivi,i{1,,d}.(1.17)\Sigma v_i = \lambda_i v_i, \quad i \in \{1, \cdots, d\}. \tag{1.17}

Show that Σ\Sigma can be written as

Σ=VΛVT,(1.18)\Sigma = V \Lambda V^T, \tag{1.18}

where V=[v1,,vd]V = [v_1, \cdots, v_d] is the matrix of eigenvectors, and Λ=diag(λ1,,λd)\Lambda = \text{diag}(\lambda_1, \cdots, \lambda_d) is a diagonal matrix of the eigenvalues.

(e) Let y=VT(xμ)y = V^T(x - \mu). Show that the Mahalanobis distance xμΣ2\|x - \mu\|^2_{\Sigma} can be rewritten as yΛ2\|y\|^2_{\Lambda}, i.e., a Mahalanobis distance with a diagonal covariance matrix. (Hint: use Problem 1.12). Hence, in the space of yy, the multivariate Gaussian has a diagonal covariance matrix.

(f) Consider the transformation from yy to xx: x=Vy+μx = Vy + \mu. What is the effect of VV and μ\mu?

(g) Plot the Mahalanobis distance term and probability density function for a 2-dimensional Gaussian with μ=[00]\mu = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, and Σ=[0.6250.3750.3750.625]\Sigma = \begin{bmatrix} 0.625 & 0.375 \\ 0.375 & 0.625 \end{bmatrix}. How is the shape of the density affected by the eigenvectors and eigenvalues of Σ\Sigma?