Some classical problems
Inverse transform algorithm
Rejection sampling
Given a probability distribution ν, generate random variables X1,…,Xn from ν
ν can be specified in different forms
Distribution function F(x)
Density/Mass function p(x)
Unnormalized density/mass function q(x)∝p(x)
Some general description of the distribution
Many quantities of interest in statistics can be written as expectations of random variables, expressed as high-dimensional integrals:
I=EX[f(X)]=∫f(x)p(x)dx,X∼p(x).
In many cases we are unable to evaluate I exactly, but it is possible to find an unbiased estimator for I:
^I=1MM∑i=1f(Xi),Xi∼p(x), i=1,…,M.
The key challenge is to generate X1,…,XM∼p(x).
To maximize the marginal likelihood L(θ;X)=∫p(X|Z,θ)p(Z|θ)dZ, EM algorithm consists of two steps:
Expectation: Define Q(θ|θ(t))=EZ|X,θ(t)[logL(θ;X,Z)]
Maximization: Solve θ(t+1)=argmaxθ Q(θ|θ(t))
When Q(θ|θ(t)) has no closed form, it needs to be approximated by Monte Carlo samples from p(Z|X,θ(t)).
This method is typically called Monte Carlo expectation-maximization (MCEM).
At the core of Bayesian statistics is the sampling from posterior distribution p(θ|x)∝p(θ,x)=π(θ)p(x|θ).
In this case, p(θ|x) is given in the unnormalized form, since we only have access to p(θ,x)=π(θ)p(x|θ).
Strictly speaking, it is unlikely to get "real" random numbers in computers
What existing algorithms generate are pseudo random numbers
We omit these technical details
Instead, we assume there is a "generator" that can produce independent random variables U1,U2,… that follow the uniform distribution Unif(0,1)
Our target is to generate other random numbers based on U1,U2,…
Random permutation
One-pass random selection
Generating normal random variables
Problem: Generate a permutation of 1,…,n such that all n! possible orderings are equally likely.
Problem: Generate a permutation of 1,…,n such that all n! possible orderings are equally likely.
Widely used in sampling without replacement
Can select only a subset
A simple and natural method:
Generate a sequence of uniform random numbers U1,…,Un
Sort U1,…,Un and record their orders I1,…,In
That is, UI1≤⋯≤UIn
Then I1,…,In is a random permutation of 1,…,n
The complexity of this algorithm is O(nlog(n))
A better algorithm is as follows [1]:
Here Int(x) is the largest integer less than or equal to x
This is known as the Fisher-Yates shuffling algorithm
Complexity is O(n)
For each k, I follows a (discrete) uniform distribution on [k]:={1,2,…,k}
When k=n, P(Pn=i)=1/n for i∈[n]
When k=n−1, P(Pn−1=i|Pn=pn)=1/(n−1) for i∈[n]∖{pn}
When k=n−2, P(Pn−2=i|Pn=pn,Pn−1=pn−1)=1/(n−2) for i∈[n]∖{pn,pn−1}
By reduction, we have P(P1=p1,…,Pn=pn)=1/n!, where {p1,…,pn} is a permutation of [n]
Problem: Suppose that there is a sequence of nonnegative numbers a1,…,an, and one can access them sequentially. The task is to randomly select an index i∗ such that P(i∗=i)=ai/∑nj=1aj, with only one pass of data access.
Problem: Suppose that there is a sequence of nonnegative numbers a1,…,an, and one can access them sequentially. The task is to randomly select an index i∗ such that P(i∗=i)=ai/∑nj=1aj, with only one pass of data access.
Useful when accessing data is expensive
For example, reading data from hard disks
Can be extended to multiple independent selections
We need to show that P(i∗=i)=ai/Sn for i=1,…,n, where Sk=∑kj=1aj.
Now we attempt to prove a stronger conclusion: after k iterations, we have P(i∗=i)=ai/Sk for i=1,…,k
Clearly this holds for k=1
Suppose that the result holds for k=l, and then in the (l+1)-th iteration:
Problem: Given a uniform random number generator, simulate random variables X1,…,Xniid∼N(0,1).
Problem: Given a uniform random number generator, simulate random variables X1,…,Xniid∼N(0,1).
Foundation of many simulation algorithms
Extensively studied
Many different algorithms
Generate two independent uniform random numbers U1 and U2
Let Z1=√−2log(U1)cos(2πU2) and Z2=√−2log(U1)sin(2πU2)
Then Z1 and Z2 are two independent N(0,1) random variables
Generate two independent uniform random numbers U1 and U2
Let Z1=√−2log(U1)cos(2πU2) and Z2=√−2log(U1)sin(2πU2)
Then Z1 and Z2 are two independent N(0,1) random variables
Requires evaluating functions log(x), √x, sin(x), and cos(x)
May be inefficient when lots of random numbers are required
Using the general inverse transform algorithm (introduced later):
Generate uniform random number U
Set Z=Φ−1(U), where Φ(x) is the c.d.f. of N(0,1)
Using the general inverse transform algorithm (introduced later):
Generate uniform random number U
Set Z=Φ−1(U), where Φ(x) is the c.d.f. of N(0,1)
Essentially a carefully designed rejection method (introduced later). From [3]:
Essentially a carefully designed rejection method (introduced later). From [3]:
We do a simple comparison between R's built-in rnorm()
and various normal random number generators based on the Ziggurat method.
library(dplyr)library(bench)library(dqrng)library(RcppZiggurat)bench::mark( rnorm(1e7), dqrnorm(1e7), zrnorm(1e7), min_iterations = 10, max_iterations = 10, check = FALSE) %>% select(expression, min, median)
## # A tibble: 3 × 3## expression min median## <bch:expr> <bch:tm> <bch:tm>## 1 rnorm(1e+07) 417.7ms 420.4ms## 2 dqrnorm(1e+07) 94.9ms 99.9ms## 3 zrnorm(1e+07) 73.1ms 75.6ms
Inverse transform algorithm
Rejection sampling
Suppose we have a discrete distribution with the following probability mass function:
p(xi)≡P(X=xi)=pi,i=0,1,…, ∑jpj=1
The following algorithm [1] can be used to generate X∼p(x):
This procedure can be compactly expressed as X=F−1(U), where F(⋅) is the c.d.f. of X, and F−1(⋅) is the (generalized) inverse c.d.f.:
F−1(p)=inf{x∈R:p≤F(x)}.
This is why this method is called the inverse transform algorithm for discrete distributions.
For continuous distributions, the algorithm is more straightforward. To generate X∼p(x):
Generate uniform random number U
Set X=F−1(U), where F(x) is the c.d.f. of X
To prove that X∼p(x), we only need to show P(X≤x)=F(x):
P(X≤x)=P(F−1(U)≤x)(by algorithm)=P(U≤F(x))(monotonicity of F(⋅))=F(x)(c.d.f of uniform distribution)
In general, the inverse transform algorithm only applies to univariate distributions (but we have a generalized method called measure transport)
Also, evaluating F−1(⋅) may be difficult
Consider the exponential distribution with mean 1, whose distribution function is given by F(x)=1−e−x
Using the inverse transform algorithm we have X=−log(1−U)
Since 1−U also follows Unif(0,1), we can simply do X=−log(U)
To generate exponential random variable with mean θ, we have X=−θlog(U)
Rejection sampling is a general technique for exact sampling. It applies to both discrete and continuous distributions, and is not limited to univariate distributions.
Suppose we want to generate X∼f(x), and we have an existing method to sample from another distribution X∼g(x). f(x) and g(x) can be interpreted as probability mass functions or density functions.
Let c>0 be a constant such that f(x)/g(x)≤c for all x∈X, where X stands for the support of g(x).
Then the rejection sampling method proceeds as follows:
Generate Y∼g(y)
Generate a uniform random number U
If U≤f(Y)/[cg(Y)], set X=Y. Otherwise return to Step 1.
From [1]:
(For discrete distribution)
(For continuous distribution)
The generated random variable X follows f(x).
The number of iterations of the algorithm is a geometric random variable with mean c.
The generated random variable X follows f(x).
The number of iterations of the algorithm is a geometric random variable with mean c.
Question: What is the best possible value of c ?
P(accepted|Y=y)=f(y)/[cg(y)]
P(accepted)=∫P(accepted|Y=y)g(y)dy=1/c
Let X=(−∞,x1]×⋯×(−∞,xp], and then
P(X∈X)=P(Y∈X,accepted)+(1−1/c)P(X∈X)=∫XP(accepted|Y=y)g(y)dy+(1−1/c)P(X∈X)=(1/c)∫Xf(y)dy+(1−1/c)P(X∈X)
Consider the set
SM(h)={(x,z):0≤z≤Mh(x),x∈Rp,z∈R},
where h is a density function, and M>0 is a constant.
We show that if (X,Z)∼Unif(SM(f)), then X∼f(x).
Let S=SM(f) and X=(−∞,x1]×⋯×(−∞,xp], and then by definition,
P(X∈X)=vol(S∩X×[0,+∞))vol(S)=∫X∫Mf(x)0dzdx∫Rp∫Mf(x)0dzdx=∫Xf(x)dx,
which proves that X has density f(x).
Conversely, if X∼h(x) and Z|{X=x}∼Unif(0,Mh(x)), then (X,Z)∼Unif(SM(h)).
Finding a good proposal distribution g(x) can be hard
Finding the constant c is even harder
Rejection sampling may "waste" a lot of random numbers
If f(x) is high-dimensional, the problem would be much more challenging
[1] Sheldon M. Ross (2011). Simulation. Academic Press.
[2] Michael W. Mahoney (2016). Lecture notes on randomized linear algebra. arXiv:1608.04481.
[3] David B. Thomas et al. (2007). Gaussian random number generators. ACM Computing Surveys.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |