Computational Statistics
Lecture 11
Yixuan Qiu
2022-11-30
1 / 29

Simulation and Sampling2 / 29

Today's Topics

Importance sampling
Measure transport sampler

3 / 29

Importance Sampling4 / 29

Importance sampling

Strictly speaking, importance sampling (IS) is not a method to obtain sample points that follow the target distribution .

Instead, it is a technique to estimate expectations related to :

5 / 29

A Direct Solution

Of course, one direct method to approximate is to generate , and then an unbiased estimator for is given by

Suppose that we use rejection sampling to get based on a proposal distribution . There are two issues here:

Rejection sampling discards sample points
may be close to zero outside a region for which is small

6 / 29

Example

We want to estimate and , where .

Naive solution: sample , and get

Problem: is very small (true value ~0.00084), so maybe all 's are smaller than .

7 / 29

Example

est_naive = function(n)
{
    x = rnorm(n)
    p_hat = mean(x > pi)
    mu_hat = sum(x * (x > pi)) / sum(x > pi)
    c(p_hat, mu_hat)
}
set.seed(123)
est_naive(n = 100)

## [1]   0 NaN

8 / 29

Motivation

IS attempts to resolve the previous issues:

It does not discard or waste any sample points
Instead, it assigns different weights to each point
By properly choosing the proposal distribution, it is able to more effectively generate sample points around the "important region"

9 / 29

Basic Idea

The idea of IS is in fact quite simple. It is based on a straightforward identity:

where is another density function that is positive on , and denotes the expectation for .

Accordingly, the IS estimate for is

10 / 29

Theorem ^[2]

Suppose that whenever . Then , and , where

and .

11 / 29

Example

Back to the previous example, note that

where is a shifted exponential distribution. Similarly,

12 / 29

Example

est_is = function(n)
{
    x = rexp(n) + pi
    ratio = exp(dnorm(x, log = TRUE) + x - pi)
    p_hat = mean(ratio)
    mu_hat = mean(x * ratio) / p_hat
    c(p_hat, mu_hat)
}
set.seed(123)
est_is(n = 100)

## [1] 0.0007478989 3.4296517837

13 / 29

Optimal ^[2]

It can be proved that the optimal proposal distribution is given by .

Proof: for any density function such that when ,

14 / 29

Optimal

This means that to approximate , IS can be better than the simple Monte Carlo estimator!

15 / 29

Self-Normalized IS

Suppose that we can only compute and , the self-normalized IS estimate is given by

where and .

Under mild conditions, is a consistent estimator of , but in general is no longer unbiased.

16 / 29

Measure Transport Sampler17 / 29

Recap: Inverse Transform Algorithm

is the density function

18 / 29

Random Variable Transformation

Continuous random variable
density function
a monotone function
Define , then its density function is given by
Can extend to multivariate case

19 / 29

Random Vector Transformation

Continuous random vector ,
density function
a diffeomorphism (a smooth mapping with smooth inverse)
Define , then its density function is given by
(cf. ) is the Jacobian matrix of (cf. )

20 / 29

Recall: Box-Muller Transform

Let and
Then and are two independent random variables

21 / 29

Recall: Box-Muller Transform

Transformation mapping
Jacobian matrix
Determinant

22 / 29

Recall: Box-Muller Transform

Clearly, , so
To express using and , note that
Therefore, which implies that and follow independent

23 / 29

Ideas from Inverse Transform

Fix a source distribution , e.g.
Given a target distribution
Can we learn a mapping such that if ?

24 / 29

Measure Transport Sampler

If we can obtain such a mapping, sampling would be easy:
- Simulate
- Set
- Then
The key is to find such a mapping , also called a transport map

25 / 29

Transport Map

For practical use, the transport map needs to have some "nice" properties:

should be invertible and differentiable
and should be easy to compute
should be flexible enough to characterize sophisticated nonlinear mappings

26 / 29

Modeling Transport Maps

Polynomials ^[3] (not good enough)
Normalizing flows ^[4] (tools from the deep learning community)

27 / 29

Training

See [3] for details.

28 / 29

References

[1] Sheldon M. Ross (2011). Simulation. Academic Press.

[2] https://artowen.su.domains/mc/Ch-var-is.pdf

[3] Youssef Marzouk, Tarek Moselhy, Matthew Parno, and Alessio Spantini (2016). An introduction to sampling via measure transport. arXiv:1602.05023.

[4] Matthew Hoffman et al. (2019). Neutra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Computational Statistics

Lecture 11

Yixuan Qiu

2022-11-30

Simulation and Sampling

Today's Topics

Importance Sampling

Importance sampling

A Direct Solution

Example

Example

Motivation

Basic Idea

Theorem [2]

Example

Example

Optimal q(x) [2]

Optimal q(x)

Self-Normalized IS

Measure Transport Sampler

Recap: Inverse Transform Algorithm

Random Variable Transformation

Random Vector Transformation

Recall: Box-Muller Transform

Recall: Box-Muller Transform

Recall: Box-Muller Transform

Ideas from Inverse Transform

Measure Transport Sampler

Transport Map

Modeling Transport Maps

Training

References

Simulation and Sampling

Help

Theorem ^[2]

Optimal ^[2]

Optimal