Computational Statistics
Lecture 10
Yixuan Qiu
2023-11-22
1 / 44

Simulation and Sampling2 / 44

(Video: https://colindcarroll.com/2018/11/24/animated-mcmc-with-matplotlib/)

3 / 44

Today's Topics

Some classical problems
Inverse transform algorithm
Rejection sampling

4 / 44

Problem

Given a probability distribution $\nu$, generate random variables $X_1,\ldots,X_n$ from $\nu$

$\nu$ can be specified in different forms

Distribution function $F(x)$
Density/Mass function $p(x)$
Unnormalized density/mass function $q(x)\propto p(x)$
Some general description of the distribution

5 / 44

Why We Need Sampling

Many quantities of interest in statistics can be written as expectations of random variables, expressed as high-dimensional integrals:

$$I=\mathbb{E}_X[f(X)]=\int f(x)p(x)\mathrm{d}x,\quad X\sim p(x).$$

In many cases we are unable to evaluate $I$ exactly, but it is possible to find an unbiased estimator for $I$:

$$\hat{I}=\frac{1}{M}\sum_{i=1}^M f(X_i),\quad X_i\sim p(x),\ i=1,\ldots,M.$$

The key challenge is to generate $X_1,\ldots,X_M\sim p(x)$.

6 / 44

Example - EM Algorithm

To maximize the marginal likelihood $$L(\theta;X)=\int p(X|Z,\theta)p(Z|\theta)\mathrm{d}Z,$$ EM algorithm consists of two steps:

Expectation: Define $$Q(\theta|\theta^{(t)})=\mathbb{E}_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]$$
Maximization: Solve $$\theta^{(t+1)}=\underset{\theta}{\arg\max}\ Q(\theta|\theta^{(t)})$$

7 / 44

Example - EM Algorithm

When $Q(\theta|\theta^{(t)})$ has no closed form, it needs to be approximated by Monte Carlo samples from $p(Z|X,\theta^{(t)})$.

This method is typically called Monte Carlo expectation-maximization (MCEM).

8 / 44

Example - Bayesian Statistics

At the core of Bayesian statistics is the sampling from posterior distribution $$p(\theta|x)\propto p(\theta,x)=\pi(\theta)p(x|\theta).$$

In this case, $p(\theta|x)$ is given in the unnormalized form, since we only have access to $p(\theta,x)=\pi(\theta)p(x|\theta)$.

9 / 44

Pseudo Random Number

Strictly speaking, it is unlikely to get "real" random numbers in computers
What existing algorithms generate are pseudo random numbers
We omit these technical details
Instead, we assume there is a "generator" that can produce independent random variables $U_1,U_2,\ldots$ that follow the uniform distribution $\mathrm{Unif}(0,1)$
Our target is to generate other random numbers based on $U_1,U_2,\ldots$

10 / 44

Classical Problems11 / 44

Classical Problems

Random permutation
One-pass random selection
Generating normal random variables

12 / 44

Random Permutation

Problem: Generate a permutation of $1,\ldots, n$ such that all $n!$ possible orderings are equally likely.

13 / 44

Random Permutation

Problem: Generate a permutation of $1,\ldots, n$ such that all $n!$ possible orderings are equally likely.

Widely used in sampling without replacement
Can select only a subset

13 / 44

Algorithm

A simple and natural method:

Generate a sequence of uniform random numbers $U_1,\ldots,U_n$
Sort $U_1,\ldots,U_n$ and record their orders $I_1,\ldots,I_n$
That is, $U_{I_1}\le\cdots\le U_{I_n}$
Then $I_1,\ldots,I_n$ is a random permutation of $1,\ldots,n$
The complexity of this algorithm is $O(n\log(n))$

14 / 44

Algorithm

A better algorithm is as follows ^[1]:

Here $\mathrm{Int}(x)$ is the largest integer less than or equal to $x$
This is known as the Fisher-Yates shuffling algorithm
Complexity is $O(n)$

15 / 44

Proof Sketch

For each $k$, $I$ follows a (discrete) uniform distribution on $[k]:=\{1,2,\ldots,k\}$
When $k=n$, $P(P_n=i)=1/n$ for $i\in [n]$
When $k=n-1$, $P(P_{n-1}=i|P_n=p_n)=1/(n-1)$ for $i\in [n]\setminus\{p_n\}$
When $k=n-2$, $P(P_{n-2}=i|P_n=p_n,P_{n-1}=p_{n-1})=1/(n-2)$ for $i\in [n]\setminus\{p_n,p_{n-1}\}$
By reduction, we have $P(P_1=p_1,\ldots,P_n=p_n)=1/n!$, where $\{p_1,\ldots,p_n\}$ is a permutation of $[n]$

16 / 44

One-Pass Random Selection

Problem: Suppose that there is a sequence of nonnegative numbers $a_1,\ldots,a_n$, and one can access them sequentially. The task is to randomly select an index $i^*$ such that $P(i^*=i)=a_i/\sum_{j=1}^n a_j$, with only one pass of data access.

17 / 44

One-Pass Random Selection

Useful when accessing data is expensive
For example, reading data from hard disks
Can be extended to multiple independent selections

17 / 44

Algorithm ^[2]

We need to show that $P(i^*=i)=a_i/S_n$ for $i=1,\ldots,n$, where $S_k=\sum_{j=1}^k a_j$.

18 / 44

Proof Sketch

Now we attempt to prove a stronger conclusion: after $\require{color}\color{deeppink}{k}$ iterations, we have $P(i^*=i)=a_i/S_k$ for $i=1,\ldots,k$
Clearly this holds for $k=1$
Suppose that the result holds for $k=l$, and then in the $(l+1)$-th iteration:
- For $i=1,\ldots,l$, $i^*=i$ means that in the new iteration $i^*$ is not updated
- For $i=l+1$, $i^*=l+1$ means $i^*$ is updated
- The probability of updating is $a_{l+1}/S_{l+1}$, which is exactly $P(i^*=l+1)$
- Finally, for $i=1,\ldots,l$, $P(i^*=i)=(a_i/S_l)\cdot(1-a_{l+1}/S_{l+1})=a_i/S_{l+1}$

19 / 44

Generating Normal Random Variables

Problem: Given a uniform random number generator, simulate random variables $X_1,\ldots,X_n\overset{iid}{\sim}N(0,1)$.

20 / 44

Generating Normal Random Variables

Problem: Given a uniform random number generator, simulate random variables $X_1,\ldots,X_n\overset{iid}{\sim}N(0,1)$.

Foundation of many simulation algorithms
Extensively studied
Many different algorithms

20 / 44

Box-Muller Transform

Generate two independent uniform random numbers $U_1$ and $U_2$
Let $Z_1=\sqrt{-2\log(U_1)}\cos(2\pi U_2)$ and $Z_2=\sqrt{-2\log(U_1)}\sin(2\pi U_2)$
Then $Z_1$ and $Z_2$ are two independent $N(0,1)$ random variables

21 / 44

Box-Muller Transform

Generate two independent uniform random numbers $U_1$ and $U_2$
Let $Z_1=\sqrt{-2\log(U_1)}\cos(2\pi U_2)$ and $Z_2=\sqrt{-2\log(U_1)}\sin(2\pi U_2)$
Then $Z_1$ and $Z_2$ are two independent $N(0,1)$ random variables

Requires evaluating functions $\log(x)$, $\sqrt{x}$, $\sin(x)$, and $\cos(x)$
May be inefficient when lots of random numbers are required

21 / 44

Inverse Transform Algorithm

Using the general inverse transform algorithm (introduced later):

Generate uniform random number $U$
Set $Z=\Phi^{-1}(U)$, where $\Phi(x)$ is the c.d.f. of $N(0,1)$

22 / 44

Inverse Transform Algorithm

Using the general inverse transform algorithm (introduced later):

Generate uniform random number $U$
Set $Z=\Phi^{-1}(U)$, where $\Phi(x)$ is the c.d.f. of $N(0,1)$

However, computing $\Phi^{-1}(\cdot)$ may be difficult and inefficient

22 / 44

Ziggurat Method

Essentially a carefully designed rejection method (introduced later). From [3]:

23 / 44

Ziggurat Method

Essentially a carefully designed rejection method (introduced later). From [3]:

Practically one of the most efficient methods to generate normal random numbers

23 / 44

Other Methods

Many other methods exist, see [3]

24 / 44

Implementation

We do a simple comparison between R's built-in rnorm() and various normal random number generators based on the Ziggurat method.

library(dplyr)
library(bench)
library(dqrng)
library(RcppZiggurat)
bench::mark(
    rnorm(1e7), dqrnorm(1e7), zrnorm(1e7),
    min_iterations = 10, max_iterations = 10, check = FALSE
) %>% select(expression, min, median)

## # A tibble: 3 × 3
##   expression          min   median
##   <bch:expr>     <bch:tm> <bch:tm>
## 1 rnorm(1e+07)    417.7ms  420.4ms
## 2 dqrnorm(1e+07)   94.9ms   99.9ms
## 3 zrnorm(1e+07)    73.1ms   75.6ms

25 / 44

Univariate random number generation

Inverse transform algorithm
Rejection sampling

26 / 44

Inverse Transform Algorithm27 / 44

Discrete Distribution

Suppose we have a discrete distribution with the following probability mass function:

$$p(x_i)\equiv P(X=x_i)=p_i,\quad i=0,1,\ldots,\ \sum_j p_j=1$$

The following algorithm ^[1] can be used to generate $X\sim p(x)$:

28 / 44

Discrete Distribution

This procedure can be compactly expressed as $X=F^{-1}(U)$, where $F(\cdot)$ is the c.d.f. of $X$, and $F^{-1}(\cdot)$ is the (generalized) inverse c.d.f.:

$$F^{-1}(p)=\inf\,\{x\in\mathbb{R}:p\le F(x)\}.$$

This is why this method is called the inverse transform algorithm for discrete distributions.

29 / 44

Continuous Distribution

For continuous distributions, the algorithm is more straightforward. To generate $X\sim p(x)$:

Generate uniform random number $U$
Set $X=F^{-1}(U)$, where $F(x)$ is the c.d.f. of $X$

30 / 44

Proof Sketch

To prove that $X\sim p(x)$, we only need to show $P(X\le x)=F(x)$:

$$\begin{align*} P(X\le x) & =P(F^{-1}(U)\le x) & & \text{(by algorithm)}\\ & =P(U\le F(x)) & & \text{(monotonicity of }F(\cdot)\text{)}\\ & =F(x) & & \text{(c.d.f of uniform distribution)} \end{align*}$$

31 / 44

Challenge

In general, the inverse transform algorithm only applies to univariate distributions (but we have a generalized method called measure transport)
Also, evaluating $F^{-1}(\cdot)$ may be difficult

32 / 44

Example

Consider the exponential distribution with mean 1, whose distribution function is given by $F(x)=1-e^{-x}$
Using the inverse transform algorithm we have $X=-\log(1-U)$
Since $1-U$ also follows $\mathrm{Unif}(0,1)$, we can simply do $X=-\log(U)$
To generate exponential random variable with mean $\theta$, we have $X=-\theta\log(U)$

33 / 44

Rejection Sampling34 / 44

Rejection Sampling

Rejection sampling is a general technique for exact sampling. It applies to both discrete and continuous distributions, and is not limited to univariate distributions.

Suppose we want to generate $X\sim f(x)$, and we have an existing method to sample from another distribution $X\sim g(x)$. $f(x)$ and $g(x)$ can be interpreted as probability mass functions or density functions.

35 / 44

Rejection Sampling

Let $c>0$ be a constant such that $f(x)/g(x)\le c$ for all $x\in\mathcal{X}$, where $\mathcal{X}$ stands for the support of $g(x)$.

Then the rejection sampling method proceeds as follows:

Generate $Y\sim g(y)$
Generate a uniform random number $U$
If $U\le f(Y)/[cg(Y)]$, set $X=Y$. Otherwise return to Step 1.

36 / 44

Diagram

From [1]:

(For discrete distribution)

(For continuous distribution)

37 / 44

Theorem ^[1]

The generated random variable $X$ follows $f(x)$.
The number of iterations of the algorithm is a geometric random variable with mean $c$.

38 / 44

Theorem ^[1]

The generated random variable $X$ follows $f(x)$.
The number of iterations of the algorithm is a geometric random variable with mean $c$.

Question: What is the best possible value of $\require{color}\color{deeppink}{c}$ ?

38 / 44

Proof Sketch

$P(\text{accepted}|Y=y)=f(y)/[cg(y)]$
$P(\text{accepted})=\int P(\text{accepted}|Y=y)g(y)\mathrm{d}y=1/c$
Let $\mathcal{X}=(-\infty,x_1]\times\cdots\times(-\infty,x_p]$, and then

$$\small\begin{align*} P(X\in\mathcal{X}) & =P(Y\in\mathcal{X},\text{accepted})+(1-1/c)P(X\in\mathcal{X})\\ & =\int_{\mathcal{X}}P(\text{accepted}|Y=y)g(y)\mathrm{d}y+(1-1/c)P(X\in\mathcal{X})\\ & =(1/c)\int_{\mathcal{X}} f(y)\mathrm{d}y+(1-1/c)P(X\in\mathcal{X}) \end{align*}$$

This shows that $P(X\in\mathcal{X})=\int_{\mathcal{X}} f(y)\mathrm{d}y$

39 / 44

Geometric Interpretation

Consider the set

$$\mathcal{S}_{M}(h)=\{(x,z):0\le z\le Mh(x),x\in\mathbb{R}^{p},z\in\mathbb{R}\},$$

where $h$ is a density function, and $M>0$ is a constant.

We show that if $(X,Z)\sim \mathrm{Unif}(\mathcal{S}_{M}(f))$, then $X\sim f(x)$.

40 / 44

Geometric Interpretation

(Image: https://artowen.su.domains/mc/Ch-nonunifrng.pdf)

41 / 44

Geometric Interpretation

Let $\mathcal{S}=\mathcal{S}_{M}(f)$ and $\mathcal{X}=(-\infty,x_1]\times\cdots\times(-\infty,x_p]$, and then by definition,

$$\begin{align*} P(X\in\mathcal{X}) & =\frac{\mathrm{vol}(\mathcal{S}\cap\mathcal{X}\times[0,+\infty))}{\mathrm{vol}(\mathcal{S})}=\frac{\int_{\mathcal{X}}\int_{0}^{Mf(x)}\mathrm{d}z\mathrm{d}x}{\int_{\mathbb{R}^{p}}\int_{0}^{Mf(x)}\mathrm{d}z\mathrm{d}x}\\ & =\int_{\mathcal{X}}f(x)\mathrm{d}x, \end{align*}$$

which proves that $X$ has density $f(x)$.

Conversely, if $X\sim h(x)$ and $Z|\{X=x\}\sim \mathrm{Unif}(0,Mh(x))$, then $(X,Z)\sim \mathrm{Unif}(\mathcal{S}_{M}(h))$.

42 / 44

Challenge

Finding a good proposal distribution $g(x)$ can be hard
Finding the constant $c$ is even harder
Rejection sampling may "waste" a lot of random numbers
If $f(x)$ is high-dimensional, the problem would be much more challenging

43 / 44

References

[1] Sheldon M. Ross (2011). Simulation. Academic Press.

[2] Michael W. Mahoney (2016). Lecture notes on randomized linear algebra. arXiv:1608.04481.

[3] David B. Thomas et al. (2007). Gaussian random number generators. ACM Computing Surveys.

[4] https://artowen.su.domains/mc/Ch-nonunifrng.pdf

44 / 44

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Computational Statistics

Lecture 10

Yixuan Qiu

2023-11-22

Simulation and Sampling

Today's Topics

Problem

Why We Need Sampling

Example - EM Algorithm

Example - EM Algorithm

Example - Bayesian Statistics

Pseudo Random Number

Classical Problems

Classical Problems

Random Permutation

Random Permutation

Algorithm

Algorithm

Proof Sketch

One-Pass Random Selection

One-Pass Random Selection

Algorithm [2]

Proof Sketch

Generating Normal Random Variables

Generating Normal Random Variables

Box-Muller Transform

Box-Muller Transform

Inverse Transform Algorithm

Inverse Transform Algorithm

Ziggurat Method

Ziggurat Method

Other Methods

Implementation

Univariate random number generation

Inverse Transform Algorithm

Discrete Distribution

Discrete Distribution

Continuous Distribution

Proof Sketch

Challenge

Example

Rejection Sampling

Rejection Sampling

Rejection Sampling

Diagram

Theorem [1]

Theorem [1]

Proof Sketch

Geometric Interpretation

Geometric Interpretation

Geometric Interpretation

Challenge

References

Simulation and Sampling

Help

Algorithm ^[2]

Theorem ^[1]

Theorem ^[1]