Case studies
Spline fitting
Back to regression
New topics
LDLT decomposition
QR decomposition
Given observed points x=(x0,…,xn)′ and y=(y0,…,yn)′, find a function f(x) such that f(xi)=yi,i=0,…,n.
Without loss of generality we can assume that x0<x1<⋯<xn
We typically require that f is continuous, or differentiable, or smooth (2nd derivative exists)
One important class of functions commonly used for interpolation are piecewise polynomials, also known as spline functions
Spline fitting is the task of determining polynomial coefficients in each interval
A commonly-used spline is the natural cubic spline
Has continuous second derivative
On each interval [xi,xi+1], it is a cubic polynomial f(x)=fi(x)=ai+bi(x−xi)+ci(x−xi)2+di(x−xi)3
We need to determine the coefficients (ai,bi,ci,di), i=0,…,n−1
The natural cubic spline that satisfies the conditions in the previous slide has the form
\begin{align*} f_{i}(x) & =\frac{z_{i+1}(x-x_{i})^{3}+z_{i}(x_{i+1}-x)^{3}}{6h_{i}}\\ & \quad+\left(\frac{y_{i+1}}{h_{i}}-\frac{z_{i+1}h_{i}}{6}\right)(x-x_{i})\\ & \quad+\left(\frac{y_{i}}{h_{i}}-\frac{z_{i}h_{i}}{6}\right)(x_{i+1}-x),\quad i=0,\ldots,n-1 \end{align*} where h_i=x_{i+1}-x_i, and z_i=f''(x_i) are values to be determined
We have assume that z_0=z_{n}=0
So there are (n-1) unknown values z_1,\ldots,z_{n-1}
The relation f_{i-1}'(x_i)=f_i'(x_i), i=1,\ldots,n-1 gives 6(b_i-b_{i-1})=h_{i-1}z_{i-1}+2(h_{i-1}+h_i)z_i+h_i z_{i+1}, where b_i=(y_{i+1}-y_i)/h_i
Further let v_i=2(h_{i-1}+h_i), u_i=6(b_i-b_{i-1})
Then we get a tridiagonal system
\begin{bmatrix}v_{1} & h_{1}\\ h_{1} & v_{2} & h_{2}\\ & h_{2} & v_{3} & \ddots\\ & & \ddots & \ddots & h_{n-2}\\ & & & h_{n-2} & v_{n-1} \end{bmatrix}\begin{bmatrix}z_{1}\\ z_{2}\\ z_{3}\\ \vdots\\ z_{n-1} \end{bmatrix}=\begin{bmatrix}u_{1}\\ u_{2}\\ u_{3}\\ \vdots\\ u_{n-1} \end{bmatrix}
Computing predicted values of linear regression \hat{y}=X(X'X)^{-1}X'Y.
NEVER do the following:
X %*% solve(t(X) %*% X) %*% t(X) %*% Y
set.seed(123)X = matrix(rnorm(5000 * 500), 5000)Y = rnorm(5000)system.time(X %*% solve(t(X) %*% X) %*% t(X) %*% Y)
## user system elapsed ## 9.09 0.03 9.36
system.time(X %*% (solve(t(X) %*% X) %*% (t(X) %*% Y)))
## user system elapsed ## 0.91 0.00 0.93
X %*% solve(t(X) %*% X) %*% t(X) %*% Y
You get ~10x speedup almost for free
The only differences are two pairs of parentheses!
When multiplying several matrices together, prefer to first compute the product involving smaller matrices
Reason: the computational complexity of the matrix multiplication A_{m\times n}B_{n\times p} is O(mnp)
system.time(X %*% solve(t(X) %*% X, t(X) %*% Y))
## user system elapsed ## 0.78 0.00 0.76
Avoid explicit matrix inversion whenever possible
Instead, solve linear systems
In fact, in statistical computing, it is very rare that you really need A^{-1}
Computing A^{-1}B is equivalent to solving AX=B
Computing BA^{-1} is equivalent to solving A'X=B'
system.time(X %*% solve(crossprod(X), crossprod(X, Y)))
## user system elapsed ## 0.72 0.00 0.70
Avoid explicit matrix transpose
Try specialized functions
crossprod(x)
: compute X'X
crossprod(x, y)
: compute X'Y
tcrossprod(x, y)
: compute XY'
Utilize special structures of matrices
For example, symmetry, positive definiteness, etc.
Try to further improve the solution!
LU decomposition applies to general square matrices
Cholesky decomposition applies to positive definite matrices
If we only know the matrix is symmetric
Is there any tailored decomposition method?
[4] shows that a nonsingular symmetric matrix A can be decomposed as A=LDL'
L is lower triangular with ones on the diagonal
D is a block diagonal matrix, each block being 1\times 1 or 2\times 2, e.g.,
\tiny\begin{bmatrix}\begin{array}{rrrrr} 1 & -3 & 4 & -4 & 0\\ -3 & 10 & -12 & 6 & 13\\ 4 & -12 & 11 & -1 & -25\\ -4 & 6 & -1 & 6 & -5\\ 0 & 13 & -25 & 5 & 42 \end{array}\end{bmatrix}=\begin{bmatrix}\begin{array}{rrrrr} 1\\ -3 & 1\\ 4 & -2 & 1\\ -4 & 0 & -3 & 1\\ 0 & 3 & 5 & 2 & 1 \end{array}\end{bmatrix}\begin{bmatrix}\begin{array}{rrrrr} 1\\ & 1 & 2\\ & 2 & -1\\ & & & -1\\ & & & & 2 \end{array}\end{bmatrix}\begin{bmatrix}\begin{array}{rrrrr} 1 & -3 & 4 & -4 & 0\\ & 1 & -2 & 0 & 3\\ & & 1 & -3 & 5\\ & & & 1 & 2\\ & & & & 1 \end{array}\end{bmatrix}
Suppose the decomposition P'AP=LDL' is given
If we wish to solve Ax=b, then:
Compute \tilde{b}=P'b
Solve Ly_1=\tilde{b} to get y_1
Solve Dy_2=y_1 to get y_2
Solve L'\tilde{x}=y_2 to get \tilde{x}
Compute x=P\tilde{x}
Solving Dy=c is similar to solving a diagonal system
Since D is block diagonal, D=\mathrm{bdiag}(D_1,\ldots,D_K)
Each D_k is 1\times 1 or 2\times 2
Partition c according to D, c=(c_1',\ldots,c_K')'
Then y=((D_1^{-1}c_1)',\ldots,(D_K^{-1}c_K)')'
If D_k is 2\times 2, use closed-form inverse formula
The most commonly-used algorithm for the LDL^T decomposition is the Bunch-Kaufman algorithm [5]
We omit the details here since it is pretty involved
Interested readers are referred to [4] and [5]
The computational complexity is O(n^3), with a multiplicative factor similar to Cholesky decomposition
Unfortunately, at the current moment there is no mature R interface to this decomposition
There is indeed a BunchKaufman()
function in the Matrix package, but it is somewhat incomplete
The good news is that the algorithm has been implemented in the LAPACK library, which is included by R
We just need to write a simple C++ wrapper to call the function via Rcpp
ldlt_fac = function(A){ res = Matrix::BunchKaufman(A, uplo = "L") list(perm = res@perm, fac = matrix(res@x, res@Dim[1]))}
ldlt_solve()
compiler_flags = "-lRlapack -lRblas -lgfortran -lm -lquadmath"Sys.setenv("PKG_LIBS" = compiler_flags)Rcpp::cppFunction(includes = "#include <R_ext/Lapack.h>",code = " NumericVector ldlt_solve(List fac, NumericVector b) { NumericMatrix fac_res = fac[\"fac\"]; IntegerVector perm = fac[\"perm\"]; NumericVector sol = Rcpp::clone(b); const int n = fac_res.nrow(); const int nrhs = 1; char uplo = 'L'; int info; F77_CALL(dsytrs)( &uplo, &n, &nrhs, fac_res.begin(), &n, perm.begin(), sol.begin(), &n, &info FCONE); return sol; }")
set.seed(123)n = 2000M = matrix(rnorm(n^2), n)A = M + t(M)b = rnorm(n)fac = ldlt_fac(A)x = ldlt_solve(fac, b)max(abs(A %*% x - b)) # Verify A*x = b
## [1] 8.482548e-12
solve()
library(dplyr)bench::mark( solve(A, b), ldlt_fac(A), ldlt_solve(fac, b), max_iterations = 10, check = FALSE) %>% select(expression, min, median)
## # A tibble: 3 × 3## expression min median## <bch:expr> <bch:tm> <bch:tm>## 1 solve(A, b) 1.44s 1.44s## 2 ldlt_fac(A) 813.09ms 813.09ms## 3 ldlt_solve(fac, b) 3.44ms 4.09ms
As long as you can guarantee A is symmetric, use LDL^T decomposition to solve linear systems
If positive definiteness is guaranteed, use Cholesky decomposition
Most of the decomposition methods introduced before heavily rely on triangular matrices
This is because triangular matrices have cheap "inverse" operations
Another special type of matrices is the orthogonal matrix
Q'Q=QQ'=I, which implies Q^{-1}=Q'
QR decomposition factorizes a square matrix into the product of two matrices, A=QR
Q is an orthogonal matrix, Q'Q=QQ'=I
R is upper triangular
Let A be a nonsingular square matrix. There exists a unique pair (Q,R), where Q is an orthogonal matrix and R is an upper triangular matrix with positive diagonal entries, such that A=QR.
Suppose the decomposition A=QR is given
If we wish to solve Ax=b, then:
First, compute \tilde{b}=Q'b
Second, solve Rx=\tilde{b} to get x
Suppose the decomposition A=QR is given
If we wish to solve Ax=b, then:
First, compute \tilde{b}=Q'b
Second, solve Rx=\tilde{b} to get x
However, it is uncommon to directly use QR decomposition for solving linear systems
Suppose the decomposition A=QR is given
If we wish to solve Ax=b, then:
First, compute \tilde{b}=Q'b
Second, solve Rx=\tilde{b} to get x
However, it is uncommon to directly use QR decomposition for solving linear systems
Typically slower than LU decomposition
More useful in factorizing rectangular matrices
Let A_{m\times n} be a matrix with \mathrm{rank}(A)=n, n\le m. Then there exist an orthogonal matrix Q_{m\times m} and an upper triangular matrix R_{n\times n} with positive diagonal entries, such that A=Q\begin{pmatrix}R\\ 0 \end{pmatrix}=\begin{pmatrix}Q_{1} & Q_{2}\end{pmatrix}\begin{pmatrix}R\\ 0 \end{pmatrix}. In addition, the matrices R and Q_1=AR^{-1} are uniquely determined.
According to the factorization theorem in the previous slide, any rectangular matrix A with full column rank can be factorized as A=Q_1R
A\in \mathbb{R}^{m\times n}, Q_1\in \mathbb{R}^{m\times n}, R\in \mathbb{R}^{n\times n}
Q_1 is column-orthonormal, meaning that Q_1'Q_1=I_{n\times n}
This is called the thin QR decomposition
If the (thin) QR decomposition result X=QR is given
Then X'X=R'Q'QR=R'R, which is exactly the Cholesky decomposition of A=X'X
To see why the thin QR decomposition is useful, recall the regression coefficient estimate \hat{\beta}=(X'X)^{-1}X'Y and predicted values \hat{y}=X\hat{\beta}
Suppose we have X_{n\times p}=QR, X full column rank, then:
QQ' is essentially the projection matrix, or hat matrix, in regression analysis
To see why the thin QR decomposition is useful, recall the regression coefficient estimate \hat{\beta}=(X'X)^{-1}X'Y and predicted values \hat{y}=X\hat{\beta}
Suppose we have X_{n\times p}=QR, X full column rank, then:
QQ' is essentially the projection matrix, or hat matrix, in regression analysis
Warning: Q_{n\times p} is only column-orthonomal, so \require{color}\color{deeppink}QQ'\neq I_{n\times n}! (We do have Q'Q=I_{p\times p})
Here we omit the computing algorithm for QR decomposition
Interested readers are referred to
The important point is, in practice we do NOT explicitly form the Q matrix
Instead, given a vector y, computing Qy and Q'y has specialized algorithms
For rectangular matrix X_{n\times p}, X full column rank, n\ge p
Computing the thin QR decomposition using the modified Gram-Schmidt algorithm costs O(np^2)
Using the Householder algorithm costs O(np^2-p^3/3)
qr()
function in R can perform QR decompositionset.seed(123)x = matrix(rnorm(25), 5, 5)qr_res = qr(x)
qr_res
## $qr## [,1] [,2] [,3] [,4] [,5]## [1,] 1.67880106 -1.8735119 -0.1240544 -2.4977185 -0.6449736## [2,] 0.13710826 -1.3836930 -1.2267898 -0.6009298 0.7682812## [3,] -0.92846517 0.8909949 -0.7676374 -1.1355032 0.8695156## [4,] -0.04199925 -0.4147299 0.9330131 0.3672150 -1.0253122## [5,] -0.07701195 -0.1723435 -0.3178735 -0.8619799 -0.5906208## ## $rank## [1] 5## ## $qraux## [1] 1.3338547 1.0665197 1.1686505 1.5069425 0.5906208## ## $pivot## [1] 1 2 3 4 5## ## attr(,"class")## [1] "qr"
qr.Q()
and qr.R()
qr.Q(qr_res)
## [,1] [,2] [,3] [,4] [,5]## [1,] -0.33385471 -0.7874465 -0.2822091 0.43404722 -0.02073790## [2,] -0.13710826 -0.1474621 -0.2109068 -0.47031583 0.83293313## [3,] 0.92846517 -0.3428718 -0.1241733 0.01466849 0.06897252## [4,] 0.04199925 0.4395243 -0.8533937 0.27599706 0.02448081## [5,] 0.07701195 0.2178078 0.3635610 0.71694943 0.54812026
qr.R(qr_res)
## [,1] [,2] [,3] [,4] [,5]## [1,] 1.678801 -1.873512 -0.1240544 -2.4977185 -0.6449736## [2,] 0.000000 -1.383693 -1.2267898 -0.6009298 0.7682812## [3,] 0.000000 0.000000 -0.7676374 -1.1355032 0.8695156## [4,] 0.000000 0.000000 0.0000000 0.3672150 -1.0253122## [5,] 0.000000 0.000000 0.0000000 0.0000000 -0.5906208
qr.qy()
and qr.qty()
Q = qr.Q(qr_res)v = rnorm(5)Q %*% v
## [,1]## [1,] -0.65989192## [2,] 1.65499813## [3,] -1.80255035## [4,] -0.11692776## [5,] -0.02040374
qr.qy(qr_res, v)
## [1] -0.65989192 1.65499813 -1.80255035 -0.11692776 -0.02040374
crossprod(Q, v) # == t(Q) %*% v
## [,1]## [1,] 0.6394024## [2,] 0.9249034## [3,] 1.7073775## [4,] -0.5390799## [5,] 1.4027564
qr.qty(qr_res, v)
## [1] 0.6394024 0.9249034 1.7073775 -0.5390799 1.4027564
qr()
function also supports thin QR decompositionset.seed(123)x = matrix(rnorm(15), 5, 3)qr_res = qr(x)qr.Q(qr_res)
## [,1] [,2] [,3]## [1,] -0.33385471 -0.7874465 -0.2822091## [2,] -0.13710826 -0.1474621 -0.2109068## [3,] 0.92846517 -0.3428718 -0.1241733## [4,] 0.04199925 0.4395243 -0.8533937## [5,] 0.07701195 0.2178078 0.3635610
qr.R(qr_res)
## [,1] [,2] [,3]## [1,] 1.678801 -1.873512 -0.1240544## [2,] 0.000000 -1.383693 -1.2267898## [3,] 0.000000 0.000000 -0.7676374
complete = TRUE
argumentqr.Q(qr_res, complete = TRUE)
## [,1] [,2] [,3] [,4] [,5]## [1,] -0.33385471 -0.7874465 -0.2822091 -0.23791265 0.36362704## [2,] -0.13710826 -0.1474621 -0.2109068 0.95639468 0.01684644## [3,] 0.92846517 -0.3428718 -0.1241733 0.05201684 0.04760905## [4,] 0.04199925 0.4395243 -0.8533937 -0.11881268 0.25031427## [5,] 0.07701195 0.2178078 0.3635610 0.10901648 0.89586144
qr.R(qr_res, complete = TRUE)
## [,1] [,2] [,3]## [1,] 1.678801 -1.873512 -0.1240544## [2,] 0.000000 -1.383693 -1.2267898## [3,] 0.000000 0.000000 -0.7676374## [4,] 0.000000 0.000000 0.0000000## [5,] 0.000000 0.000000 0.0000000
library(Matrix)set.seed(123)M = matrix(rnorm(2000^2), 2000)A = crossprod(M)bench::mark( lu(A), chol(A), qr(A), max_iterations = 10, check = FALSE) %>% select(expression, min, median)
## # A tibble: 3 × 3## expression min median## <bch:expr> <bch:tm> <bch:tm>## 1 lu(A) 1.32s 1.32s## 2 chol(A) 1.27s 1.27s## 3 qr(A) 4.18s 4.18s
set.seed(123)n = 2000p = 200x = matrix(rnorm(n * p), n, p)bench::mark( chol(crossprod(x)), qr(x), max_iterations = 100, check = FALSE) %>% select(expression, min, median)
## # A tibble: 2 × 3## expression min median## <bch:expr> <bch:tm> <bch:tm>## 1 chol(crossprod(x)) 42.2ms 45.1ms## 2 qr(x) 57.6ms 58.2ms
QR decomposition is seldom used to solve linear systems directly
Thin QR decomposition is widely used in regression computation
[1] Folkmar Bornemann (2018). Numerical Linear Algebra. Springer.
[2] Grégoire Allaire and Sidi Mahmoud Kaber (2008). Numerical linear algebra. Springer.
[3] Åke Björck (2015). Numerical methods in matrix computations. Springer.
[4] James R. Bunch and Beresford N. Parlett (1971). Direct methods for solving symmetric indefinite systems of linear equations. SIAM Journal on Numerical Analysis.
[5] James R. Bunch and Linda Kaufman (1977). Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of computation.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |