Computational Statistics
Lecture 2
Yixuan Qiu
2023-09-13
1 / 63

Numerical Linear Algebra2 / 63

Important Problems

Linear systems
Eigenvalues
Sparse matrices

3 / 63

Today's Topics

Preliminaries
Triangular and tridiagonal systems
LU decomposition
Cholesky decomposition

4 / 63

Preliminaries5 / 63

Useful Formulas6 / 63

Trace

$t r (A B) = t r (B A)$
$t r (A B C) = t r (B C A) = t r (C A B)$
$t r (A B C) \neq t r (A C B)$
$t r (A^{'} B) = v e c (A)^{'} v e c (B) = v e c (B)^{'} v e c (A)$

7 / 63

Determinant

$det (A B) = det (A) det (B)$
$det (A^{- 1}) = [det (A)]^{- 1}$
Weinstein-Aronszajn identity: $det (I_{m} + A_{m \times n} B_{n \times m}) = det (I_{n} + B A)$

8 / 63

Woodbury Identity

$(A + U C V)^{- 1} = A^{- 1} - A^{- 1} U (C^{- 1} + V A^{- 1} U)^{- 1} V A^{- 1}$

Special cases:

$(I + U V)^{- 1} = I - U (I + V U)^{- 1} V$
$(I + U U^{'})^{- 1} = I - U (I + U^{'} U)^{- 1} U^{'}$
$(A + u v^{'})^{- 1} = A^{- 1} - \frac{A^{- 1} u v^{'} A^{- 1}}{1 + v^{'} A^{- 1} u}$

9 / 63

Block Matrix Inversion

If $D - {C A}^{- 1} B$ is invertible:

${[\begin{matrix} A & B \\ C & D \end{matrix}]}^{- 1} = [\begin{matrix} A^{- 1} + A^{- 1} B {(D - {C A}^{- 1} B)}^{- 1} {C A}^{- 1} & - A^{- 1} B {(D - {C A}^{- 1} B)}^{- 1} \\ - {(D - {C A}^{- 1} B)}^{- 1} {C A}^{- 1} & {(D - {C A}^{- 1} B)}^{- 1} \end{matrix}]$

If $A - {B D}^{- 1} C$ is invertible:

${[\begin{matrix} A & B \\ C & D \end{matrix}]}^{- 1} = [\begin{matrix} {(A - {B D}^{- 1} C)}^{- 1} & - {(A - {B D}^{- 1} C)}^{- 1} {B D}^{- 1} \\ - D^{- 1} C {(A - {B D}^{- 1} C)}^{- 1} & D^{- 1} + D^{- 1} C {(A - {B D}^{- 1} C)}^{- 1} {B D}^{- 1} \end{matrix}]$

10 / 63

Block Matrix Determinant

If $A$ is invertible:

$det (\begin{matrix} A & B \\ C & D \end{matrix}) = det (A) det (D - C A^{- 1} B)$

For any $A, B \in R^{n \times n}$ :

$det (\begin{matrix} A & B \\ B & A \end{matrix}) = det (A + B) det (A - B)$

11 / 63

Linear Systems12 / 63

Triangular Systems13 / 63

Triangular Matrices

$L = [\begin{matrix} \times \\ \times & \times \\ ⋮ & ⋮ & ⋱ \\ \times & \times & \dots & \times \end{matrix}], U = [\begin{matrix} \times & \dots & \times & \times \\ ⋱ & ⋮ & ⋮ \\ \times & \times \\ \times \end{matrix}]$

Triangular matrices are important building blocks of numerical linear algebra
Cheap multiplication, determinant, and linear system solving
Many matrix decompositions are in the form of triangular matrices

14 / 63

Basic Properties ^[1]

A triangular matrix is invertible if and only if all its diagonal entries are non-zero
Invertible lower (upper) triangular matrices are closed under multiplication and inversion
The determinant of a triangular matrix is the product of its diagonal entries

$det [\begin{matrix} λ_{1} \\ \times & λ_{2} \\ ⋮ & ⋮ & ⋱ \\ \times & \times & \dots & λ_{m} \end{matrix}] = λ_{1} det [\begin{matrix} λ_{2} \\ ⋮ & ⋱ \\ \times & \dots & λ_{m} \end{matrix}] = \dots = λ_{1} \dots λ_{m}$

15 / 63

Basic PropertiesQuestion: What is the computational complexity of multiplying a triangular matrix with a vector?
16 / 63

Basic Properties

Question: What is the computational complexity of multiplying a triangular matrix with a vector?
$O (n^{2})$

16 / 63

Triangular Systems

We are interested in solving the linear systems of equations $L x = b$ $L = [\begin{matrix} \begin{array}{cccc} l_{11} \\ l_{21} & l_{22} \\ ⋮ & ⋮ & ⋱ \\ l_{n 1} & l_{n 2} & \dots & l_{n n} \end{array} \end{matrix}]$
The algorithm is called forward substitution

17 / 63

Forward Substitution

Let $l_{k, 1 : k} = (l_{k 1}, \dots, l_{k k})^{'}$ , $x_{1 : k} = (x_{1}, \dots, x_{k})^{'}$ , $k = 1, \dots, n$
The $k$ -th equation gives $l_{k, 1 : k}^{'} x_{1 : k} = \sum_{i = 1}^{k - 1} l_{k i} x_{i} + l_{k k} x_{k} = b_{k}$
We can iteratively compute $x_{1}, \dots, x_{n}$

18 / 63

Algorithm

Start with $x_{1} = b_{1} / l_{11}$
For $k = 2, \dots, n$ , iterate $x_{k} = (b_{k} - \sum_{i = 1}^{k - 1} l_{k i} x_{i}) / l_{k k}$

19 / 63

Algorithm

Start with $x_{1} = b_{1} / l_{11}$
For $k = 2, \dots, n$ , iterate $x_{k} = (b_{k} - \sum_{i = 1}^{k - 1} l_{k i} x_{i}) / l_{k k}$
Computational complexity is $O (n^{2})$

19 / 63

Backward Substitution

Similarly, we can solve the linear systems of equations $U x = b$ with an upper triangular matrix $U = [\begin{matrix} u_{11} & u_{12} & \dots & u_{1 n} \\ u_{22} & \dots & u_{2 n} \\ ⋱ & ⋮ \\ u_{n n} \end{matrix}]$
The algorithm is called backward substitution

20 / 63

Backward Substitution

Similarly, we can solve the linear systems of equations $U x = b$ with an upper triangular matrix $U = [\begin{matrix} u_{11} & u_{12} & \dots & u_{1 n} \\ u_{22} & \dots & u_{2 n} \\ ⋱ & ⋮ \\ u_{n n} \end{matrix}]$
The algorithm is called backward substitution
Please derive this algorithm by yourself

20 / 63

Implementation

Simulate matrices

set.seed(123)
n = 2000
b = rnorm(n)
M = 0.1 * matrix(rnorm(n^2), n)
diag(M) = abs(diag(M)) + 1
# Get lower and upper triangular parts
L = M * lower.tri(M, diag = TRUE)
U = M * upper.tri(M, diag = TRUE)

21 / 63

Implementation

In R, we have the forwardsolve() and backsolve() functions to implement the two algorithms

x1 = forwardsolve(L, b)
max(abs(L %*% x1 - b))  # Verify L*x1 = b

## [1] 4.193368e-12

x2 = backsolve(U, b)
max(abs(U %*% x2 - b))  # Verify U*x2 = b

## [1] 5.541345e-12

22 / 63

Implementation

In fact, we do not need to create a new matrix
The upper/lower triangular part would not be used in forwardsolve()/backsolve()
Reduces memory use

x1 = forwardsolve(M, b)
max(abs(L %*% x1 - b))  # Verify L*x1 = b

## [1] 4.193368e-12

x2 = backsolve(M, b)
max(abs(U %*% x2 - b))  # Verify U*x2 = b

## [1] 5.541345e-12

23 / 63

Benchmark

Whenever possible, use forwardsolve()/backsolve() instead of solve()

library(dplyr)
bench::mark(
    solve(L, b), forwardsolve(L, b), backsolve(U, b),
    max_iterations = 10, check = FALSE
) %>% select(expression, min, median)

## # A tibble: 3 × 3
##   expression              min   median
##   <bch:expr>         <bch:tm> <bch:tm>
## 1 solve(L, b)           1.42s    1.42s
## 2 forwardsolve(L, b)   1.33ms   1.42ms
## 3 backsolve(U, b)      1.31ms   1.43ms

24 / 63

Excercise

Implement the backward substitution algorithm
Experiment on simulated matrices
Use backsolve() to verify the result

25 / 63

Tridiagonal Systems26 / 63

Tridiagonal Matrices

Tridiagonal matrices are band matrices that have nonzero elements only on the diagonal and two subdiagonals

$T = [\begin{matrix} b_{1} & c_{1} \\ a_{2} & b_{2} & c_{2} \\ a_{3} & b_{3} & ⋱ \\ ⋱ & ⋱ & c_{n - 1} \\ a_{n} & b_{n} \end{matrix}]$

Useful in eigenvalue computation and spline interpolation

27 / 63

Basic Properties

Can be represented by three vectors
Storage requirement is linear in $n$ , i.e., $O (n)$
Cheap multiplication, determinant, and linear system solving
Question: What is the computational complexity of multiplying a tridiagonal matrix with a vector?

28 / 63

Basic Properties

Can be represented by three vectors
Storage requirement is linear in $n$ , i.e., $O (n)$
Cheap multiplication, determinant, and linear system solving
Question: What is the computational complexity of multiplying a tridiagonal matrix with a vector?
$O (n)$

28 / 63

Tridiagonal Systems

We are interested in solving the linear systems of equations $T x = d$

$[\begin{matrix} b_{1} & c_{1} \\ a_{2} & b_{2} & c_{2} \\ a_{3} & b_{3} & ⋱ \\ ⋱ & ⋱ & c_{n - 1} \\ a_{n} & b_{n} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ ⋮ \\ x_{n} \end{matrix}] = [\begin{matrix} d_{1} \\ d_{2} \\ d_{3} \\ ⋮ \\ d_{n} \end{matrix}]$

29 / 63

Algorithm

Forward sweep

$c_{i}^{'} = {\begin{cases} \frac{c_{i}}{b_{i}}, & i = 1 \\ \frac{c_{i}}{b_{i} - a_{i} c_{i - 1}^{'}}, & i = 2, \dots, n - 1 \end{cases}, d_{i}^{'} = {\begin{cases} \frac{d_{i}}{b_{i}}, & i = 1 \\ \frac{d_{i} - a_{i} d_{i - 1}^{'}}{b_{i} - a_{i} c_{i - 1}^{'}}, & i = 2, \dots, n \end{cases}$

Backward substitution

$\begin{aligned} x_{n} & = d_{n}^{'}, \\ x_{i} & = d_{i}^{'} - c_{i}^{'} x_{i + 1}, i = n - 1, \dots, 1 \end{aligned}$

Computational complexity is $O (n)$

30 / 63

Implementation

Simulate matrices

set.seed(123)
n = 2000
diagonal = rnorm(n)
lsubdiag = rnorm(n - 1)
usubdiag = rnorm(n - 1)
b = rnorm(n)
# Explicitly form the matrix
M = diag(diagonal)
diag(M[2:n, ]) = lsubdiag
diag(M[, 2:n]) = usubdiag
M[1:5, 1:5]

##            [,1]       [,2]       [,3]       [,4]      [,5]
## [1,] -0.5604756  1.1210219  0.0000000 0.00000000 0.0000000
## [2,] -0.5116037 -0.2301775  0.1965498 0.00000000 0.0000000
## [3,]  0.0000000  0.2369379  1.5587083 0.65011319 0.0000000
## [4,]  0.0000000  0.0000000 -0.5415892 0.07050839 0.6710042
## [5,]  0.0000000  0.0000000  0.0000000 1.21922765 0.1292877

31 / 63

Implementation

The tridiagmatrix() function in the cmna R package implements the described algorithm

library(cmna)
x = tridiagmatrix(lsubdiag, diagonal, usubdiag, b)
max(abs(M %*% x - b))  # Verify M*x = b

## [1] 2.298162e-13

32 / 63

Benchmark

The specialized algorithm is much faster than the general solve()

bench::mark(
    solve(M, b),
    tridiagmatrix(lsubdiag, diagonal, usubdiag, b),
    max_iterations = 10, check = FALSE
) %>% select(expression, min, median)

## # A tibble: 2 × 3
##   expression                                          min   median
##   <bch:expr>                                     <bch:tm> <bch:tm>
## 1 solve(M, b)                                       1.37s    1.37s
## 2 tridiagmatrix(lsubdiag, diagonal, usubdiag, b)  614.1µs 650.25µs

33 / 63

BenchmarkWarning: the described algorithm may be unstable in some cases. If stability is required, use LU decomposition instead
34 / 63

LU Decomposition35 / 63

LU Decomposition

LU decomposition is an important method to solve general linear systems
Factorizes an $n \times n$ matrix $A$ into the product of two triangular matrices, $A = L U$
$L$ lower triangular, with ones on the diagonal
$U$ upper triangular

36 / 63

Factorization Theorem ^[2]

Let $A$ be an $n \times n$ matrix such that all of its diagonal submatrices are nonsingular. Then there exist a lower triangular matrix $L$ with unit diagonal elements (i.e., $l_{i i} = 1$ ), and an upper triangular matrix $U$ , such that $A = L U$ .

37 / 63

Solving Linear Systems

Suppose the decomposition $A = L U$ is given
If we wish to solve $A x = b$ , then:
First, solve $L y = b$ to get $y$ (forward substitution)
Second, solve $U x = y$ to get $x$ (backward substitution)

38 / 63

Computing Determinant

Suppose the decomposition $A = L U$ is given
Question: How to compute $det (A)$ ?

39 / 63

Computing Determinant

Suppose the decomposition $A = L U$ is given
Question: How to compute $det (A)$ ?
$det (A) = det (L) det (U) = det (U) = u_{11} \dots u_{n n}$

39 / 63

Algorithm ^[3]

Based on the relation

40 / 63

Pivoting

In practice, LU decomposition is performed with a permutation matrix $Q$ , which is typically called pivoting: $Q A = L U$
This is used to enhance the numerical stability
$Q$ has the effect of interchanging rows of $A$
$Q$ satisfies $Q^{'} Q = Q Q^{'} = I$

41 / 63

Pivoting

When using this factorization to solve linear systems $A x = b$
First, compute $\tilde{b} = Q b$
Second, solve $L y = \tilde{b}$ to get $y$
Third, solve $U x = y$ to get $x$

42 / 63

Computational Complexity

Decomposition part is $O (n^{3})$
Linear system solving part is $O (n^{2})$

43 / 63

Implementation

The lu() function in the Matrix package can be used to compute the LU decomposition of a square matrix
Note that this function factorizes $A$ as $A = P L U$ , or equivalently, $P^{'} A = L U$ (i.e., $Q = P^{'}$ )

library(Matrix)

## Warning: package 'Matrix' was built under R version 4.2.3

set.seed(123)
n = 5
M = matrix(rnorm(n^2), n)
lu_obj = lu(M)
decomp = expand(lu_obj)
names(decomp)  # A named list

## [1] "L" "U" "P"

44 / 63

Implementation

decomp$L

## 5 x 5 Matrix of class "dtrMatrix" (unitriangular)
##      [,1]        [,2]        [,3]        [,4]        [,5]       
## [1,]  1.00000000           .           .           .           .
## [2,] -0.35957699  1.00000000           .           .           .
## [3,]  0.04523514 -0.49963385  1.00000000           .           .
## [4,]  0.08294543 -0.27038315 -0.28235315  1.00000000           .
## [5,] -0.14767195  0.21751065  0.15641432 -0.65806034  1.00000000

decomp$U

## 5 x 5 Matrix of class "dtrMatrix"
##      [,1]       [,2]       [,3]       [,4]       [,5]      
## [1,]  1.5587083 -1.2650612  0.4007715 -1.9666172 -1.0260044
## [2,]          .  1.2601781  1.3681900  1.0797629 -1.4367513
## [3,]          .          .  0.7761478  1.3298022 -1.4003294
## [4,]          .          .          .  0.3577540 -1.3237976
## [5,]          .          .          .          . -0.7090854

45 / 63

Implementation

decomp$P

## 5 x 5 sparse Matrix of class "pMatrix"
##               
## [1,] . | . . .
## [2,] . . . . |
## [3,] | . . . .
## [4,] . . | . .
## [5,] . . . | .

# Verify P*L*U = A
max(abs(decomp$P %*% decomp$L %*% decomp$U - M))

## [1] 2.220446e-16

46 / 63

Implementation

We can then write a simple function to solve linear systems given the decomposition result

lu_solve = function(decomp, b)
{
    bt = t(decomp$P) %*% b
    y = forwardsolve(as.matrix(decomp$L), bt)  # L*y = b
    x = backsolve(as.matrix(decomp$U), y)      # U*x = y
    x
}

47 / 63

Implementation

Verify the result

set.seed(123)
n = 2000
M = matrix(rnorm(n^2), n)
b = rnorm(n)
decomp = expand(lu(M))
x = lu_solve(decomp, b)
max(abs(M %*% x - b))  # Verify M*x = b

## [1] 1.933176e-11

48 / 63

Benchmark

The LU decomposition itself dominates the time of solving linear systems

bench::mark(
    solve(M, b), expand(lu(M)), lu_solve(decomp, b),
    max_iterations = 10, check = FALSE
) %>% select(expression, min, median)

## # A tibble: 3 × 3
##   expression               min   median
##   <bch:expr>          <bch:tm> <bch:tm>
## 1 solve(M, b)             1.4s     1.4s
## 2 expand(lu(M))          1.37s    1.37s
## 3 lu_solve(decomp, b)  17.63ms  18.85ms

49 / 63

When to Use?

If $A x = b$ only needs to be solved once:
- Directly calling solve() is more convenient
- solve() internally uses algorithms similar to LU decomposition
If a sequence of linear systems $A x^{(k)} = b^{(k)}$ need to be solved iteratively
- First factorize $A$ using LU decomposition
- Then use lu_solve() to get each solution $x^{(k)}$
We will encounter the second case in many optimization problems

50 / 63

Cholesky Decomposition51 / 63

Cholesky Decomposition

LU decomposition applies to general square matrices
If we can guarantee the matrix $A$ to be positive definite
Example: $A = X^{'} X + λ I$ , $λ > 0$
Then the Cholesky decomposition $A = L L^{'}$ can be computed
$L$ is a lower triangular matrix

52 / 63

Factorization Theorem ^[2]

Let $A$ be an $n \times n$ real symmetric positive definite matrix. Then there exists a unique lower triangular matrix $L$ with positive diagonal elements, such that $A = L L^{'}$ .

53 / 63

Algorithm ^[3]

Based on the relation $l_{j j} = \sqrt{a_{j j} - \sum_{k = 1}^{j - 1} l_{j k}^{2}}$ $l_{i j} = (a_{i j} - \sum_{k = 1}^{j - 1} l_{i k} l_{j k}) / l_{j j}, i = j + 1, \dots, n$

54 / 63

Solving Linear Systems

Suppose the decomposition $A = L L^{'}$ is given
If we wish to solve $A x = b$ , then:
First, solve $L y = b$ to get $y$ (forward substitution)
Second, solve $L^{'} x = y$ to get $x$ (backward substitution)

55 / 63

Computing Determinant

Suppose the decomposition $A = L L^{'}$ is given
Question: How to compute $det (A)$ ?

56 / 63

Computing Determinant

Suppose the decomposition $A = L L^{'}$ is given
Question: How to compute $det (A)$ ?
$det (A) = [det (L)]^{2} = l_{11}^{2} \dots l_{n n}^{2}$

56 / 63

Computational Complexity

Decomposition part is $O (n^{3})$
Linear system solving part is $O (n^{2})$
Same orders as LU decomposition, but in general faster

57 / 63

Implementation

In R, the chol() function can be used to compute Cholesky decomposition
Note that this function factorizes $A$ as $A = R^{'} R$ , so you actually get $R = L^{'}$

set.seed(123)
n = 5
M = matrix(rnorm(n^2), n)
A = crossprod(M)  # positive definite
R = chol(A)
R

##          [,1]      [,2]       [,3]       [,4]       [,5]
## [1,] 1.678801 -1.873512 -0.1240544 -2.4977185 -0.6449736
## [2,] 0.000000  1.383693  1.2267898  0.6009298 -0.7682812
## [3,] 0.000000  0.000000  0.7676374  1.1355032 -0.8695156
## [4,] 0.000000  0.000000  0.0000000  0.3672150 -1.0253122
## [5,] 0.000000  0.000000  0.0000000  0.0000000  0.5906208

58 / 63

Implementation

Then we can write a simple function to solve linear systems given the decomposition result

chol_solve = function(R, b)
{
    y = backsolve(R, b, transpose = TRUE)  # R'*y = b       
    x = backsolve(R, y)                    # R*x = y
    x
}

59 / 63

Implementation

Verify the result

set.seed(123)
n = 2000
M = matrix(rnorm(n^2), n)
A = crossprod(M)
b = rnorm(n)
R = chol(A)
max(abs(crossprod(R) - A))  # Verify R'R = A

## [1] 1.364242e-12

x = chol_solve(R, b)
max(abs(A %*% x - b))       # Verify A*x = b

## [1] 4.437666e-09

60 / 63

Benchmark

The decomposition part dominates the time of solving linear systems

bench::mark(
    solve(A, b), chol(A), chol_solve(R, b),
    max_iterations = 10, check = FALSE
) %>% select(expression, min, median)

## # A tibble: 3 × 3
##   expression            min   median
##   <bch:expr>       <bch:tm> <bch:tm>
## 1 solve(A, b)         1.38s    1.38s
## 2 chol(A)             1.31s    1.31s
## 3 chol_solve(R, b)   3.62ms   5.04ms

61 / 63

When to Use?

As long as you can guarantee $A$ is positive definite, prefer to use Cholesky decomposition
solve() does not know the special property of $A$ , so it will use the slightly slower LU decomposition
Cholesky decomposition utilizes this "prior information"
Cholesky decomposition can also be used when a sequence of linear systems $A x^{(k)} = b^{(k)}$ need to be solved iteratively

62 / 63

References

[1] Folkmar Bornemann (2018). Numerical Linear Algebra. Springer.

[2] Grégoire Allaire and Sidi Mahmoud Kaber (2008). Numerical linear algebra. Springer.

[3] Åke Björck (2015). Numerical methods in matrix computations. Springer.

63 / 63

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Computational Statistics

Lecture 2

Yixuan Qiu

2023-09-13

Numerical Linear Algebra

Important Problems

Today's Topics

Preliminaries

Useful Formulas

Trace

Determinant

Woodbury Identity

Block Matrix Inversion

Block Matrix Determinant

Linear Systems

Triangular Systems

Triangular Matrices

Basic Properties [1]

Basic Properties

Basic Properties

Triangular Systems

Forward Substitution

Algorithm

Algorithm

Backward Substitution

Backward Substitution

Implementation

Implementation

Implementation

Benchmark

Excercise

Tridiagonal Systems

Tridiagonal Matrices

Basic Properties

Basic Properties

Tridiagonal Systems

Algorithm

Implementation

Implementation

Benchmark

Benchmark

LU Decomposition

LU Decomposition

Factorization Theorem [2]

Solving Linear Systems

Computing Determinant

Computing Determinant

Algorithm [3]

Pivoting

Pivoting

Computational Complexity

Implementation

Implementation

Implementation

Implementation

Implementation

Benchmark

When to Use?

Cholesky Decomposition

Cholesky Decomposition

Factorization Theorem [2]

Algorithm [3]

Solving Linear Systems

Computing Determinant

Computing Determinant

Computational Complexity

Implementation

Implementation

Implementation

Benchmark

When to Use?

References

Numerical Linear Algebra

Help

Basic Properties ^[1]

Factorization Theorem ^[2]

Algorithm ^[3]

Factorization Theorem ^[2]

Algorithm ^[3]