Conjugate gradient method
Eigenvalue computation
All the methods introduced so far can be categorized as direct methods to solve linear systems
Meaning that the exact solution to Ax=b can be computed within a finite number of operations (in exact arithmetic)
Cons: too expensive to handle very large linear systems
Computes a sequence of approximate solutions x(k) that converges to the exact solution
Stops when the precision is sufficient
Do not request more precision than is needed
In each iteration, the computation is typically cheap, e.g., matrix-vector multiplication v→Av
Especially efficient for sparse matrices
Conjugate gradient method (CG) is a very special linear system solver
It is a direct method used as an iterative one
Aims to solve the linear system Ax=b when A is positive definite (p.d.)
Only uses the matrix-vector multiplication v→Av
Obtains the exact solution after n steps
Converges fast under some conditions
In exact arithmetic, the algorithm converges to the solution of the linear system Ax=b in at most n iterations.
Let A be a p.d. matrix and denote by x∗ the solution to Ax=b. Let x(k) be the sequence of approximate solutions produced by the conjugate gradient method. Then
∥x(k)−x∗∥2≤2√κ(√κ−1√κ+1)k∥x0−x∗∥2, where κ is the condition number of A (defined later).
For any matrix A, the condition number κ(A) is defined as κ(A)=μmax(A)/μmin(A), where μmax and μmin are the largest and smallest singular values, respectively
For a p.d. matrix A, κ(A)=λmax(A)/λmin(A), where λmax and λmin are the largest and smallest eigenvalues, respectively
κ(A)≥1
For any orthogonal matrix Q, κ(Q)=1 and κ(QA)=κ(AQ)=κ(A)
If a p.d. matrix A has almost equal eigenvalues, then κ(A)≈1 and √κ−1√κ+1≈0. CG will have extremely fast convergence
If A almost loses positive definiteness, λmin(A)≈0, then κ(A)≫1, and CG almost has no progress
cg = function(A, b, x0 = rep(0, length(b)), eps = 1e-6){ m = length(b) x = x0 p = r = b - A %*% x r2 = sum(r^2) errs = c() for(i in 1:m) { Ap = A %*% p alpha = r2 / sum(p * Ap) x = x + alpha * p r = r - alpha * Ap r2_new = sum(r^2) err = sqrt(r2_new) errs = c(errs, err) if(err < eps) break beta = r2_new / r2 p = r + beta * p r2 = r2_new } list(x = x, errs = errs, niter = i)}
We test on a simulated matrix
However, the algorithm does not seem to converge, and the error is quite large
set.seed(123)n = 100M = matrix(0.1 * rnorm(n^2), n)A = crossprod(M)b = rnorm(n)sol = cg(A, b, eps = 1e-12)sol$niter
## [1] 100
max(abs(A %*% sol$x - b))
## [1] 0.5763024
kappa(A)
## [1] 1164091
kappa(A1 <- A + diag(rep(0.1, n)))
## [1] 134.8415
kappa(A2 <- A + diag(rep(1, n)))
## [1] 15.01074
sol1 = cg(A1, b, eps = 1e-12)sol1$niter
## [1] 78
max(abs(A1 %*% sol1$x - b))
## [1] 1.588243e-13
sol2 = cg(A2, b, eps = 1e-12)sol2$niter
## [1] 29
max(abs(A2 %*% sol2$x - b))
## [1] 1.858513e-13
A is p.d. and is well-conditioned
The matrix-vector operation v→Av is cheap
One example is sparse matrix, or product of sparse matrices
We will elaborate this point when we study sparse matrices
Eigenvalues are important characteristics of matrices
Wide applications in statistics, e.g., principal component analysis (PCA), spectral clustering, matrix norm, random matrix theory, etc.
Eigenvalue computation is one of the core topics of numerical linear algebra
The definition of eigenvalue is actually quite simple
For a square matrix A, we call the solution λ to the equation Ax=λx an eigenvalue of A
The corresponding x vector is an eigenvector
Here we focus on real symmetric matrices
A matrix An×n is real symmetric if and only if there exists a real orthogonal matrix Γ and real eigenvalues λ1,…,λn such that A=ΓDΓ′, where D=diag(λ1,…,λn).
This is typically called the eigenvalue decomposition, or spectral decomposition, of A.
Suppose that A is a real symmetric matrix
A is p.d. if all its eigenvalues are strictly positive
Ak=ΓDkΓ′, k=…,−2,−1,0,1,2,…
det(A)=λ1⋯λn
tr(A)=λ1+⋯+λn
Min-Max principle λmin(A)=minx≠0x′Axx′x=min∥x∥=1 x′Ax λmax(A)=maxx≠0x′Axx′x=max∥x∥=1 x′Ax
This gives PCA a nice interpretation: the first principal component maximizes the explained variance
There are different cases of computing eigenvalues
Computing all eigenvalues (eigenvalue decomposition)
Computing the largest eigenvalue
Computing the eigenvalue that is closest to a given number
...
Eigenvalue computation is a large field in numerical linear algebra
The actual algorithms used in practice are usually quite involved
For the purpose of statistical computing, we mainly introduce the basic ideas behind these algorithms and some recommended implementations
In what follows, we focus on finding eigenvalues of real symmetric matrices
The most commonly-used method to compute all eigenvalues consists of two steps:
Q′AQ=T=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣α1β2β2α2β3⋱⋱⋱βn−1αn−1βnβnαn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦
The Householder algorithm finds a sequence of (simple) orthogonal matrices Qk such that T=(Q1⋯Qn−2)′A(Q1⋯Qn−2), where T is tridiagonal
Q=Q1⋯Qn−2 is also orthogonal (why?)
A and T have the same eigenvalues (why?)
The motivation is that T has cheaper matrix operations
The QR algorithm iteratively reduces the tridiagonal T into a diagonal matrix by orthogonal transformations:
Initialize T(0)=T, and begin the loop k=0,1,…
Compute the QR decomposition of the matrix T(k): T(k)=QkRk
Update T(k+1)=RkQk=Q′kT(k)Qk
Assume that |λ1|>⋯>|λn|>0, and then T(k) converges to a diagonal matrix, whose diagonal elements are eigenvalues of T and A.
The Householder algorithm costs O(n3)
The QR decomposition of tridiagonal matrices has a specialized algorithm, which costs O(n2) instead of O(n3) (the latter for general dense matrices)
It can be proved that all T(k) matrices are tridiagonal
The number of iterations required in the QR algorithm depends on the distribution of eigenvalues
In many applications, we only need the largest eigenvalue (in absolute value)
It can be extended to computing the largest k eigenvalues
If λ1,…,λn are eigenvalues of A with |λ1|>⋯>|λn|, and γ1,…,γn are the associated eigenvectors, then:
One widely-used and possibly the simplest method to compute the largest eigenvalue (in absolute value) is the power method
Assume A has eigenvalues λ1>|λ2|≥⋯≥|λn| and associated eigenvectors γ1,…,γn. Also assume the initial vector satisfies x0=∑ni=1βiγi for some β1≠0. Then
|∥yk∥−λ1|≤C(|λ2|λ1)k,∥xk−~γ1∥≤C(|λ2|λ1)k, where ~γ1=±γ1.
The power method only requires the matrix-vector multiplication v→Av to compute eigenvalues of A (recall the CG method)
This is important and useful for sparse matrices
However, it does not effectively use the information of past iterations
Also, it can only compute one eigenvalue at one time
Mainstream software packages typically use other advanced algorithms to compute the largest eigenvalues, e.g.,
They also only need the v→Av operation
Can compute multiple eigenvalues together
These algorithms themselves are quite sophisticated, however
Another type of problem is to find the eigenvalue that is closest to a given number μ
For example, if μ=0, then this is equivalent to finding the smallest eigenvalue (in absolute value)
Fortunately, we can make use of existing methods that compute largest eigenvalues for this task, via a technique called spectral transformation
Spectral transformation is based on the following finding.
If λi, i=1,…,n are eigenvalues of A, then Axi=λixi for the eigenvectors xi. Choose a shift μ≠λi, then the eigenvalues of B=(A−μIn)−1 are θi=(λi−μ)−1, and Bθi=θixi.
Therefore, the largest eigenvalue of B corresponds to the eigenvalue of A that is closest to μ.
If we want to compute the eigenvalue of A closest to μ
We can apply the power method (or other methods based on matrix-vector multiplication) to B=(A−μIn)−1
The iteration becomes yk=(A−μIn)−1xk−1xk=yk/∥yk∥
By the convergence theorem, we have ∥yk∥→θ1
Then recover the eigenvalue of A by λμ=μ+1/θ1
In actual implementation, we do not directly compute the matrix inverse (A−μIn)−1
Instead, we factorize A−μIn once using LU decomposition or LDLT decomposition
And then solve the linear systems (A−μIn)yk=xk−1
The eigen()
function in base R is a very stable and efficient function to compute all eigenvalues
Specify symmetric = TRUE
if you know your matrix is symmetric (by default it will detect symmetry)
Specify only.values = TRUE
if you do not need eigenvectors
library(dplyr)set.seed(123)n = 1000M = matrix(rnorm(n^2), n)A = M + t(M)bench::mark( eigen(A), eigen(A, symmetric = FALSE), eigen(A, symmetric = TRUE), eigen(A, symmetric = TRUE, only.values = TRUE), min_iterations = 3, max_iterations = 10, check = FALSE) %>% select(expression, min, median)
## # A tibble: 4 × 3## expression min median## <bch:expr> <bch:tm> <bch:tm>## 1 eigen(A) 1.15s 1.17s## 2 eigen(A, symmetric = FALSE) 4.13s 4.14s## 3 eigen(A, symmetric = TRUE) 1.11s 1.13s## 4 eigen(A, symmetric = TRUE, only.values = TRUE) 314.12ms 314.19ms
library(RSpectra)e = eigs_sym(A, k = 3, which = "LM")e$values
## [1] 88.83776 88.31500 -88.98945
head(e$vectors)
## [,1] [,2] [,3]## [1,] 0.01506291 0.029602478 -0.015855700## [2,] -0.05124211 0.016404205 -0.014749483## [3,] -0.01542738 -0.002350839 0.005520141## [4,] 0.02921002 -0.006653817 -0.003387408## [5,] -0.02406133 0.039802584 0.028103112## [6,] -0.01309102 -0.026105929 0.073813795
which = "LA"
e = eigs_sym(A, k = 3, which = "LA")e$values
## [1] 88.83776 88.31500 87.18646
e = eigs_sym(A, k = 3, which = "LA", opts = list(retvec = FALSE))e$vectors
## NULL
eigs_sym()
is much faster than eigen()
bench::mark( eigen(A, symmetric = TRUE), eigen(A, symmetric = TRUE, only.values = TRUE), eigs_sym(A, k = 3, which = "LM"), eigs_sym(A, k = 3, which = "LM", opts = list(retvec = FALSE)), min_iterations = 3, max_iterations = 10, check = FALSE) %>% select(min, median)
## # A tibble: 4 × 2## min median## <bch:tm> <bch:tm>## 1 1.12s 1.12s## 2 310.66ms 310.66ms## 3 149.93ms 150.31ms## 4 149.43ms 150.85ms
e = eigs_sym(A, k = 3, which = "LM", sigma = 0)e$values
## [1] 0.01589015 -0.14248784 -0.26307129
In general, if sigma
is not NULL
, then the selection rule is applied to 1/(λi−σ)
This only affects the selection rule
The returned eigenvalues are still for A
[1] Folkmar Bornemann (2018). Numerical Linear Algebra. Springer.
[2] Grégoire Allaire and Sidi Mahmoud Kaber (2008). Numerical linear algebra. Springer.
[3] Åke Björck (2015). Numerical methods in matrix computations. Springer.
[4] Danny C. Sorensen (1997). Implicitly restarted Arnoldi/Lanczos methods for large scale eigenvalue calculations. In Parallel Numerical Algorithms (pp. 119-165). Springer.
[5] Gerard L.G. Sleijpen and Henk A. Van der Vorst (1996). A Jacobi–Davidson iteration method for linear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 17(2), 401-425.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |