Example of covariance rule application

Before moving on, let's simulate this behavior with a simple Python example. We first generate 1000 values sampled from a bivariate Gaussian distribution (the variance is voluntarily asymmetric) and then we apply the covariance rule to find the first principal component (w(0) has been chosen so not to be orthogonal to v1):

import numpy as np

rs = np.random.RandomState(1000)
X = rs.normal(loc=1.0, scale=(20.0, 1.0), size=(1000, 2))

w = np.array([30.0, 3.0])

S = np.cov(X.T)

for i in range(10):
w += np.dot(S, w)
w /= np.linalg.norm(w)

w *= 50.0

print(np.round(w, 1))
[ 50. -0.]

The algorithm is straightforward, but there are a couple of elements that we need to comment on. The first one is the normalization of vector w at the end of each iteration. This is one of the techniques needed to avoid the uncontrolled growth of w. The second tricky element is the final multiplication, w • 50. As we are multiplying by a positive scalar, the direction of w is not impacted, but it's easier to show the vector in the complete plot.

The result is shown in the following diagram:

 Application of the covariance rule. w becomes proportional to the first principal component

After a limited number of iterations, w has the same orientation of the principal eigenvector which is, in this case, parallel to the x axes. The sense depends on the initial value w0; however, in a PCA, this isn't an important element.