Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Computational Statistics

Lecture 2

Yixuan Qiu

2023-09-13

1 / 63

Numerical Linear Algebra

2 / 63

Important Problems

  • Linear systems

  • Eigenvalues

  • Sparse matrices

3 / 63

Today's Topics

  • Preliminaries

  • Triangular and tridiagonal systems

  • LU decomposition

  • Cholesky decomposition

4 / 63

Preliminaries

5 / 63

Useful Formulas

6 / 63

Trace

  • tr(AB)=tr(BA)

  • tr(ABC)=tr(BCA)=tr(CAB)

  • tr(ABC)tr(ACB)

  • tr(AB)=vec(A)vec(B)=vec(B)vec(A)

7 / 63

Determinant

  • det(AB)=det(A)det(B)

  • det(A1)=[det(A)]1

  • Weinstein-Aronszajn identity: det(Im+Am×nBn×m)=det(In+BA)

8 / 63

Woodbury Identity

(A+UCV)1=A1A1U(C1+VA1U)1VA1

Special cases:

  • (I+UV)1=IU(I+VU)1V
  • (I+UU)1=IU(I+UU)1U
  • (A+uv)1=A1A1uvA11+vA1u
9 / 63

Block Matrix Inversion

If DCA1B is invertible:

[ABCD]1=[A1+A1B(DCA1B)1CA1A1B(DCA1B)1(DCA1B)1CA1(DCA1B)1]

If ABD1C is invertible:

[ABCD]1=[(ABD1C)1(ABD1C)1BD1D1C(ABD1C)1D1+D1C(ABD1C)1BD1]

10 / 63

Block Matrix Determinant

If A is invertible:

det(ABCD)=det(A)det(DCA1B)

For any A,BRn×n:

det(ABBA)=det(A+B)det(AB)

11 / 63

Linear Systems

12 / 63

Triangular Systems

13 / 63

Triangular Matrices

L=[××××××],U=[××××××]

  • Triangular matrices are important building blocks of numerical linear algebra

  • Cheap multiplication, determinant, and linear system solving

  • Many matrix decompositions are in the form of triangular matrices

14 / 63

Basic Properties [1]

  • A triangular matrix is invertible if and only if all its diagonal entries are non-zero

  • Invertible lower (upper) triangular matrices are closed under multiplication and inversion

  • The determinant of a triangular matrix is the product of its diagonal entries

det[λ1×λ2××λm]=λ1det[λ2×λm]==λ1λm

15 / 63

Basic Properties

  • Question: What is the computational complexity of multiplying a triangular matrix with a vector?
16 / 63

Basic Properties

  • Question: What is the computational complexity of multiplying a triangular matrix with a vector?

  • O(n2)

16 / 63

Triangular Systems

  • We are interested in solving the linear systems of equations Lx=b L=[l11l21l22ln1ln2lnn]

  • The algorithm is called forward substitution

17 / 63

Forward Substitution

  • Let lk,1:k=(lk1,,lkk), x1:k=(x1,,xk), k=1,,n

  • The k-th equation gives lk,1:kx1:k=k1i=1lkixi+lkkxk=bk

  • We can iteratively compute x1,,xn
18 / 63

Algorithm

  • Start with x1=b1/l11

  • For k=2,,n, iterate xk=(bkk1i=1lkixi)/lkk

19 / 63

Algorithm

  • Start with x1=b1/l11

  • For k=2,,n, iterate xk=(bkk1i=1lkixi)/lkk

  • Computational complexity is O(n2)

19 / 63

Backward Substitution

  • Similarly, we can solve the linear systems of equations Ux=b with an upper triangular matrix U=[u11u12u1nu22u2nunn]

  • The algorithm is called backward substitution

20 / 63

Backward Substitution

  • Similarly, we can solve the linear systems of equations Ux=b with an upper triangular matrix U=[u11u12u1nu22u2nunn]

  • The algorithm is called backward substitution

  • Please derive this algorithm by yourself

20 / 63

Implementation

  • Simulate matrices
set.seed(123)
n = 2000
b = rnorm(n)
M = 0.1 * matrix(rnorm(n^2), n)
diag(M) = abs(diag(M)) + 1
# Get lower and upper triangular parts
L = M * lower.tri(M, diag = TRUE)
U = M * upper.tri(M, diag = TRUE)
21 / 63

Implementation

  • In R, we have the forwardsolve() and backsolve() functions to implement the two algorithms
x1 = forwardsolve(L, b)
max(abs(L %*% x1 - b)) # Verify L*x1 = b
## [1] 4.193368e-12
x2 = backsolve(U, b)
max(abs(U %*% x2 - b)) # Verify U*x2 = b
## [1] 5.541345e-12
22 / 63

Implementation

  • In fact, we do not need to create a new matrix
  • The upper/lower triangular part would not be used in forwardsolve()/backsolve()
  • Reduces memory use
x1 = forwardsolve(M, b)
max(abs(L %*% x1 - b)) # Verify L*x1 = b
## [1] 4.193368e-12
x2 = backsolve(M, b)
max(abs(U %*% x2 - b)) # Verify U*x2 = b
## [1] 5.541345e-12
23 / 63

Benchmark

  • Whenever possible, use forwardsolve()/backsolve() instead of solve()
library(dplyr)
bench::mark(
solve(L, b), forwardsolve(L, b), backsolve(U, b),
max_iterations = 10, check = FALSE
) %>% select(expression, min, median)
## # A tibble: 3 × 3
## expression min median
## <bch:expr> <bch:tm> <bch:tm>
## 1 solve(L, b) 1.42s 1.42s
## 2 forwardsolve(L, b) 1.33ms 1.42ms
## 3 backsolve(U, b) 1.31ms 1.43ms
24 / 63

Excercise

  • Implement the backward substitution algorithm

  • Experiment on simulated matrices

  • Use backsolve() to verify the result

25 / 63

Tridiagonal Systems

26 / 63

Tridiagonal Matrices

  • Tridiagonal matrices are band matrices that have nonzero elements only on the diagonal and two subdiagonals

T=[b1c1a2b2c2a3b3cn1anbn]

  • Useful in eigenvalue computation and spline interpolation
27 / 63

Basic Properties

  • Can be represented by three vectors

  • Storage requirement is linear in n, i.e., O(n)

  • Cheap multiplication, determinant, and linear system solving

  • Question: What is the computational complexity of multiplying a tridiagonal matrix with a vector?

28 / 63

Basic Properties

  • Can be represented by three vectors

  • Storage requirement is linear in n, i.e., O(n)

  • Cheap multiplication, determinant, and linear system solving

  • Question: What is the computational complexity of multiplying a tridiagonal matrix with a vector?

  • O(n)

28 / 63

Tridiagonal Systems

  • We are interested in solving the linear systems of equations Tx=d

[b1c1a2b2c2a3b3cn1anbn][x1x2x3xn]=[d1d2d3dn]

29 / 63

Algorithm

  • Forward sweep

ci={cibi,i=1cibiaici1,i=2,,n1,di={dibi,i=1diaidi1biaici1,i=2,,n

  • Backward substitution

xn=dn,xi=dicixi+1,i=n1,,1

  • Computational complexity is O(n)
30 / 63

Implementation

  • Simulate matrices
set.seed(123)
n = 2000
diagonal = rnorm(n)
lsubdiag = rnorm(n - 1)
usubdiag = rnorm(n - 1)
b = rnorm(n)
# Explicitly form the matrix
M = diag(diagonal)
diag(M[2:n, ]) = lsubdiag
diag(M[, 2:n]) = usubdiag
M[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.5604756 1.1210219 0.0000000 0.00000000 0.0000000
## [2,] -0.5116037 -0.2301775 0.1965498 0.00000000 0.0000000
## [3,] 0.0000000 0.2369379 1.5587083 0.65011319 0.0000000
## [4,] 0.0000000 0.0000000 -0.5415892 0.07050839 0.6710042
## [5,] 0.0000000 0.0000000 0.0000000 1.21922765 0.1292877
31 / 63

Implementation

  • The tridiagmatrix() function in the cmna R package implements the described algorithm
library(cmna)
x = tridiagmatrix(lsubdiag, diagonal, usubdiag, b)
max(abs(M %*% x - b)) # Verify M*x = b
## [1] 2.298162e-13
32 / 63

Benchmark

  • The specialized algorithm is much faster than the general solve()
bench::mark(
solve(M, b),
tridiagmatrix(lsubdiag, diagonal, usubdiag, b),
max_iterations = 10, check = FALSE
) %>% select(expression, min, median)
## # A tibble: 2 × 3
## expression min median
## <bch:expr> <bch:tm> <bch:tm>
## 1 solve(M, b) 1.37s 1.37s
## 2 tridiagmatrix(lsubdiag, diagonal, usubdiag, b) 614.1µs 650.25µs
33 / 63

Benchmark

  • Warning: the described algorithm may be unstable in some cases. If stability is required, use LU decomposition instead
34 / 63

LU Decomposition

35 / 63

LU Decomposition

  • LU decomposition is an important method to solve general linear systems

  • Factorizes an n×n matrix A into the product of two triangular matrices, A=LU

  • L lower triangular, with ones on the diagonal

  • U upper triangular

36 / 63

Factorization Theorem [2]

Let A be an n×n matrix such that all of its diagonal submatrices are nonsingular. Then there exist a lower triangular matrix L with unit diagonal elements (i.e., lii=1), and an upper triangular matrix U, such that A=LU.

37 / 63

Solving Linear Systems

  • Suppose the decomposition A=LU is given

  • If we wish to solve Ax=b, then:

  • First, solve Ly=b to get y (forward substitution)

  • Second, solve Ux=y to get x (backward substitution)

38 / 63

Computing Determinant

  • Suppose the decomposition A=LU is given

  • Question: How to compute det(A)?

39 / 63

Computing Determinant

  • Suppose the decomposition A=LU is given

  • Question: How to compute det(A)?

  • det(A)=det(L)det(U)=det(U)=u11unn

39 / 63

Algorithm [3]

Based on the relation

40 / 63

Pivoting

  • In practice, LU decomposition is performed with a permutation matrix Q, which is typically called pivoting: QA=LU

  • This is used to enhance the numerical stability

  • Q has the effect of interchanging rows of A

  • Q satisfies QQ=QQ=I

41 / 63

Pivoting

  • When using this factorization to solve linear systems Ax=b

  • First, compute ˜b=Qb

  • Second, solve Ly=˜b to get y

  • Third, solve Ux=y to get x

42 / 63

Computational Complexity

  • Decomposition part is O(n3)

  • Linear system solving part is O(n2)

43 / 63

Implementation

  • The lu() function in the Matrix package can be used to compute the LU decomposition of a square matrix
  • Note that this function factorizes A as A=PLU, or equivalently, PA=LU (i.e., Q=P)
library(Matrix)
## Warning: package 'Matrix' was built under R version 4.2.3
set.seed(123)
n = 5
M = matrix(rnorm(n^2), n)
lu_obj = lu(M)
decomp = expand(lu_obj)
names(decomp) # A named list
## [1] "L" "U" "P"
44 / 63

Implementation

decomp$L
## 5 x 5 Matrix of class "dtrMatrix" (unitriangular)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00000000 . . . .
## [2,] -0.35957699 1.00000000 . . .
## [3,] 0.04523514 -0.49963385 1.00000000 . .
## [4,] 0.08294543 -0.27038315 -0.28235315 1.00000000 .
## [5,] -0.14767195 0.21751065 0.15641432 -0.65806034 1.00000000
decomp$U
## 5 x 5 Matrix of class "dtrMatrix"
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.5587083 -1.2650612 0.4007715 -1.9666172 -1.0260044
## [2,] . 1.2601781 1.3681900 1.0797629 -1.4367513
## [3,] . . 0.7761478 1.3298022 -1.4003294
## [4,] . . . 0.3577540 -1.3237976
## [5,] . . . . -0.7090854
45 / 63

Implementation

decomp$P
## 5 x 5 sparse Matrix of class "pMatrix"
##
## [1,] . | . . .
## [2,] . . . . |
## [3,] | . . . .
## [4,] . . | . .
## [5,] . . . | .
# Verify P*L*U = A
max(abs(decomp$P %*% decomp$L %*% decomp$U - M))
## [1] 2.220446e-16
46 / 63

Implementation

  • We can then write a simple function to solve linear systems given the decomposition result
lu_solve = function(decomp, b)
{
bt = t(decomp$P) %*% b
y = forwardsolve(as.matrix(decomp$L), bt) # L*y = b
x = backsolve(as.matrix(decomp$U), y) # U*x = y
x
}
47 / 63

Implementation

  • Verify the result
set.seed(123)
n = 2000
M = matrix(rnorm(n^2), n)
b = rnorm(n)
decomp = expand(lu(M))
x = lu_solve(decomp, b)
max(abs(M %*% x - b)) # Verify M*x = b
## [1] 1.933176e-11
48 / 63

Benchmark

  • The LU decomposition itself dominates the time of solving linear systems
bench::mark(
solve(M, b), expand(lu(M)), lu_solve(decomp, b),
max_iterations = 10, check = FALSE
) %>% select(expression, min, median)
## # A tibble: 3 × 3
## expression min median
## <bch:expr> <bch:tm> <bch:tm>
## 1 solve(M, b) 1.4s 1.4s
## 2 expand(lu(M)) 1.37s 1.37s
## 3 lu_solve(decomp, b) 17.63ms 18.85ms
49 / 63

When to Use?

  • If Ax=b only needs to be solved once:

    • Directly calling solve() is more convenient
    • solve() internally uses algorithms similar to LU decomposition
  • If a sequence of linear systems Ax(k)=b(k) need to be solved iteratively

    • First factorize A using LU decomposition
    • Then use lu_solve() to get each solution x(k)
  • We will encounter the second case in many optimization problems

50 / 63

Cholesky Decomposition

51 / 63

Cholesky Decomposition

  • LU decomposition applies to general square matrices

  • If we can guarantee the matrix A to be positive definite

  • Example: A=XX+λI, λ>0

  • Then the Cholesky decomposition A=LL can be computed

  • L is a lower triangular matrix

52 / 63

Factorization Theorem [2]

Let A be an n×n real symmetric positive definite matrix. Then there exists a unique lower triangular matrix L with positive diagonal elements, such that A=LL.

53 / 63

Algorithm [3]

Based on the relation ljj=ajjj1k=1l2jk lij=(aijj1k=1likljk)/ljj,i=j+1,,n

54 / 63

Solving Linear Systems

  • Suppose the decomposition A=LL is given

  • If we wish to solve Ax=b, then:

  • First, solve Ly=b to get y (forward substitution)

  • Second, solve Lx=y to get x (backward substitution)

55 / 63

Computing Determinant

  • Suppose the decomposition A=LL is given

  • Question: How to compute det(A)?

56 / 63

Computing Determinant

  • Suppose the decomposition A=LL is given

  • Question: How to compute det(A)?

  • det(A)=[det(L)]2=l211l2nn

56 / 63

Computational Complexity

  • Decomposition part is O(n3)

  • Linear system solving part is O(n2)

  • Same orders as LU decomposition, but in general faster

57 / 63

Implementation

  • In R, the chol() function can be used to compute Cholesky decomposition
  • Note that this function factorizes A as A=RR, so you actually get R=L
set.seed(123)
n = 5
M = matrix(rnorm(n^2), n)
A = crossprod(M) # positive definite
R = chol(A)
R
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.678801 -1.873512 -0.1240544 -2.4977185 -0.6449736
## [2,] 0.000000 1.383693 1.2267898 0.6009298 -0.7682812
## [3,] 0.000000 0.000000 0.7676374 1.1355032 -0.8695156
## [4,] 0.000000 0.000000 0.0000000 0.3672150 -1.0253122
## [5,] 0.000000 0.000000 0.0000000 0.0000000 0.5906208
58 / 63

Implementation

  • Then we can write a simple function to solve linear systems given the decomposition result
chol_solve = function(R, b)
{
y = backsolve(R, b, transpose = TRUE) # R'*y = b
x = backsolve(R, y) # R*x = y
x
}
59 / 63

Implementation

  • Verify the result
set.seed(123)
n = 2000
M = matrix(rnorm(n^2), n)
A = crossprod(M)
b = rnorm(n)
R = chol(A)
max(abs(crossprod(R) - A)) # Verify R'R = A
## [1] 1.364242e-12
x = chol_solve(R, b)
max(abs(A %*% x - b)) # Verify A*x = b
## [1] 4.437666e-09
60 / 63

Benchmark

  • The decomposition part dominates the time of solving linear systems
bench::mark(
solve(A, b), chol(A), chol_solve(R, b),
max_iterations = 10, check = FALSE
) %>% select(expression, min, median)
## # A tibble: 3 × 3
## expression min median
## <bch:expr> <bch:tm> <bch:tm>
## 1 solve(A, b) 1.38s 1.38s
## 2 chol(A) 1.31s 1.31s
## 3 chol_solve(R, b) 3.62ms 5.04ms
61 / 63

When to Use?

  • As long as you can guarantee A is positive definite, prefer to use Cholesky decomposition

  • solve() does not know the special property of A, so it will use the slightly slower LU decomposition

  • Cholesky decomposition utilizes this "prior information"

  • Cholesky decomposition can also be used when a sequence of linear systems Ax(k)=b(k) need to be solved iteratively

62 / 63

References

[1] Folkmar Bornemann (2018). Numerical Linear Algebra. Springer.

[2] Grégoire Allaire and Sidi Mahmoud Kaber (2008). Numerical linear algebra. Springer.

[3] Åke Björck (2015). Numerical methods in matrix computations. Springer.

63 / 63

Numerical Linear Algebra

2 / 63
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow