Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Computational Statistics

Lecture 11

Yixuan Qiu

2022-11-30

1 / 29

Simulation and Sampling

2 / 29

Today's Topics

  • Importance sampling

  • Measure transport sampler

3 / 29

Importance Sampling

4 / 29

Importance sampling

Strictly speaking, importance sampling (IS) is not a method to obtain sample points X1,,Xn that follow the target distribution p(x).

Instead, it is a technique to estimate expectations related to p(x):

μ=EX[f(X)]=f(x)p(x)dx,Xp(x).

5 / 29

A Direct Solution

Of course, one direct method to approximate μ is to generate X1,,XMp(x), and then an unbiased estimator for μ is given by

ˆμ=1MMi=1f(Xi),Xip(x), i=1,,M.

Suppose that we use rejection sampling to get Xi based on a proposal distribution q(x). There are two issues here:

  • Rejection sampling discards sample points
  • f(x) may be close to zero outside a region A for which P(XA) is small
6 / 29

Example

We want to estimate p=P(X>π) and μ=E(X|X>π), where XN(0,1).

Naive solution: sample X1,,XMiidN(0,1), and get ˆp=1MMi=1I{Xi>π},ˆμ=Mi=1XiI{Xi>π}Mi=1I{Xi>π}.

Problem: p is very small (true value ~0.00084), so maybe all Xi's are smaller than π.

7 / 29

Example

est_naive = function(n)
{
x = rnorm(n)
p_hat = mean(x > pi)
mu_hat = sum(x * (x > pi)) / sum(x > pi)
c(p_hat, mu_hat)
}
set.seed(123)
est_naive(n = 100)
## [1] 0 NaN
8 / 29

Motivation

IS attempts to resolve the previous issues:

  • It does not discard or waste any sample points

  • Instead, it assigns different weights to each point

  • By properly choosing the proposal distribution, it is able to more effectively generate sample points around the "important region" A

9 / 29

Basic Idea

The idea of IS is in fact quite simple. It is based on a straightforward identity:

μ=f(x)p(x)dx=f(x)p(x)q(x)q(x)dx=Eq(f(X)p(X)q(X)),

where q(x) is another density function that is positive on Rp, and Eq() denotes the expectation for Xq(x).

Accordingly, the IS estimate for μ is

ˆμq=1MMi=1f(Xi)p(Xi)q(Xi),Xiq(x), i=1,,M.

10 / 29

Theorem [2]

Suppose that q(x)>0 whenever f(x)p(x)0. Then Eq(ˆμq)=μ, and Varq(ˆμq)=σ2q/n, where

σ2q=Q[f(x)p(x)]2q(x)dxμ2=Q[f(x)p(x)μq(x)]2q(x)dx,

and Q={x:q(x)>0}.

11 / 29

Example

Back to the previous example, note that

p=+I(x>π)ϕ(x)dx=+πI(x>π)ϕ(x)q(x)q(x)dx=+πϕ(x)q(x)q(x)dx,q(x)={0,xπe(xπ),x>π,

where q(x) is a shifted exponential distribution. Similarly,

μ=p1+πxϕ(x)q(x)q(x)dx.

12 / 29

Example

est_is = function(n)
{
x = rexp(n) + pi
ratio = exp(dnorm(x, log = TRUE) + x - pi)
p_hat = mean(ratio)
mu_hat = mean(x * ratio) / p_hat
c(p_hat, mu_hat)
}
set.seed(123)
est_is(n = 100)
## [1] 0.0007478989 3.4296517837
13 / 29

Optimal q(x) [2]

It can be proved that the optimal proposal distribution q(x) is given by q(x)=|f(x)|p(x)/Ep(|f(X)|).

Proof: for any density function q(x) such that q(x)>0 when f(x)p(x)0,

μ2+σ2q=[f(x)p(x)]2q(x)dx=[Ep(|f(X)|)]2=[Eq(|f(X)|p(X)q(X))]2Eq([f(X)p(X)]2[q(X)]2)=[f(x)p(x)]2q(x)dx=μ2+σ2q.

14 / 29

Optimal q(x)

This means that to approximate f(x)p(x)dx, IS can be better than the simple Monte Carlo estimator!

15 / 29

Self-Normalized IS

Suppose that we can only compute pu(x)p(x) and qu(x)q(x), the self-normalized IS estimate is given by

˜μ=Mi=1f(Xi)w(Xi)Mi=1w(Xi),

where w(x)=pu(x)/qu(x) and Xiq(x).

Under mild conditions, ˜μ is a consistent estimator of μ, but in general ˜μ is no longer unbiased.

16 / 29

Measure Transport Sampler

17 / 29

Recap: Inverse Transform Algorithm

  • UUnif(0,1)

  • g=F1

  • X=g(U)Xf(x)

  • f(x)=F(x) is the density function

18 / 29

Random Variable Transformation

  • Continuous random variable XpX(x)

  • pX(x) density function

  • g:RR a monotone function

  • Define Y=g(X), then its density function is given by pY(y)=pX(g1(y))|ddyg1(y)|

  • Can extend to multivariate case

19 / 29

Random Vector Transformation

  • Continuous random vector XRd, XpX(x)

  • pX(x) density function

  • T:RdRd a diffeomorphism (a smooth mapping with smooth inverse)

  • Define Y=T(X), then its density function is given by pY(y)=pX(T1(y))|det((T1)(y))|=pX(x)|det(T(x))|1,x=T1(y)

  • T (cf. T1) is the Jacobian matrix of T (cf. T1)

20 / 29

Recall: Box-Muller Transform

  1. (U1,U2)Unif([0,1]2)

  2. Let Z1=2log(U1)cos(2πU2) and Z2=2log(U1)sin(2πU2)

  3. Then Z1 and Z2 are two independent N(0,1) random variables

21 / 29

Recall: Box-Muller Transform

  • Transformation mapping T(U1U2)=(2log(U1)cos(2πU2)2log(U1)sin(2πU2))

  • Jacobian matrix T(U1U2)=(cos(θ)U1L2πLsin(θ)sin(θ)U1L2πLcos(θ))L=2log(U1),θ=2πU2

  • Determinant detT=2πU11

22 / 29

Recall: Box-Muller Transform

  • Clearly, pU(u1,u2)=1, so pZ(z1,z2)=pU(u1,u2)|detT|1=(2π)1u1

  • To express u1 using z1 and z2, note that Z21+Z22=2log(U1)

  • Therefore, pZ(z1,z2)=12πe12(z21+z22), which implies that Z1 and Z2 follow independent N(0,1)

23 / 29

Ideas from Inverse Transform

  • Fix a source distribution pZ(z), e.g. N(0,I)

  • Given a target distribution pX(x)

  • Can we learn a mapping T such that X=T(Z)pX(x) if ZpZ(z)?

24 / 29

Measure Transport Sampler

  • If we can obtain such a mapping, sampling would be easy:

    • Simulate Z1,,ZnN(0,I)

    • Set X1=T(Z1),,Xn=T(Zn)

    • Then XiiidpX(x)

  • The key is to find such a mapping T, also called a transport map

25 / 29

Transport Map

For practical use, the transport map T needs to have some "nice" properties:

  • T should be invertible and differentiable

  • T1 and det(T(x)) should be easy to compute

  • T should be flexible enough to characterize sophisticated nonlinear mappings

26 / 29

Modeling Transport Maps

  • Polynomials [3] (not good enough)

  • Normalizing flows [4] (tools from the deep learning community)

27 / 29

Training

See [3] for details.

28 / 29

References

[1] Sheldon M. Ross (2011). Simulation. Academic Press.

[2] https://artowen.su.domains/mc/Ch-var-is.pdf

[3] Youssef Marzouk, Tarek Moselhy, Matthew Parno, and Alessio Spantini (2016). An introduction to sampling via measure transport. arXiv:1602.05023.

[4] Matthew Hoffman et al. (2019). Neutra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704.

29 / 29

Simulation and Sampling

2 / 29
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow