Example of covariance rule application

Before moving on, let's simulate this behavior with a simple Python example. We first generate 1000 values sampled from a bivariate Gaussian distribution (the variance is voluntarily asymmetric) and then we apply the covariance rule to find the first principal component (w⁽⁰⁾has been chosen so not to be orthogonal to v₁):

import numpy as np

rs = np.random.RandomState(1000)
X = rs.normal(loc=1.0, scale=(20.0, 1.0), size=(1000, 2))

w = np.array([30.0, 3.0])

S = np.cov(X.T)

for i in range(10):
    w += np.dot(S, w)
    w /= np.linalg.norm(w)
    
w *= 50.0

print(np.round(w, 1))
[ 50.  -0.]

The algorithm is straightforward, but there are a couple of elements that we need to comment on. The first one is the normalization of vector w at the end of each iteration. This is one of the techniques needed to avoid the uncontrolled growth of w. The second tricky element is the final multiplication, w • 50. As we are multiplying by a positive scalar, the direction of w is not impacted, but it's easier to show the vector in the complete plot.

The result is shown in the following diagram:

Application of the covariance rule. w_∞ becomes proportional to the first principal component

After a limited number of iterations, w_∞ has the same orientation of the principal eigenvector which is, in this case, parallel to the x axes. The sense depends on the initial value w₀; however, in a PCA, this isn't an important element.