Our first very simple example: we will use Markov chain Monte Carlo to draw samples from a standard normal distribution using a proposal distribution \[q(\cdot | X) \sim N(X, \sigma^2).\]
library(ggplot2)
simpleMCMC <- function(st_dev, start, num_total){
X = 0;
X[1] = start
for (t in 2:(num_total)){
y = rnorm(1,mean = X[t-1], sd = st_dev)
u = runif(1)
numerator = dnorm(y)*dnorm(X[t-1],mean=y, sd = st_dev)
denominator = dnorm(X[t-1])*dnorm(y,mean=X[t-1], sd = st_dev)
alpha = min(c(1,numerator/denominator))
X[t] = y*(u<= alpha) + X[t-1]*(u>alpha)
}
X
}
Now, we will test this out with different values of the standard deviation
\(\sigma=0.5\)
X = data.frame(x = simpleMCMC(0.5,-10,500), t = 1:500)
ggplot(X,aes(x=t, y=x)) + geom_point() + geom_line()
\(\sigma=0.1\)
X = data.frame(x = simpleMCMC(0.1,0,500), t = 1:500)
ggplot(X,aes(x=t, y=x)) + geom_point() + geom_line()
\(\sigma=10\)
X = data.frame(x = simpleMCMC(10,0,500), t = 1:500)
ggplot(X,aes(x=t, y=x)) + geom_point() + geom_line()
Let’s show that we get the correct answer.
X = data.frame(x = simpleMCMC(0.5,0,50000), t = 1:5000)
X <- X[10000:50000,]
mean(X$x)
## [1] -0.02637
sd(X$x)
## [1] 1.008
quantile(X$x)
## 0% 25% 50% 75% 100%
## -4.37686 -0.70038 -0.02983 0.63757 3.81768
quantile(rnorm(500))
## 0% 25% 50% 75% 100%
## -3.73404 -0.59400 -0.02245 0.58701 2.63172
Can we sample from a uniform distribution by using a normal proposal?
simpleMCMC_unif <- function(st_dev, start, num_total){
X = 0;
X[1] = start
for (t in 2:(num_total)){
y = rnorm(1,mean = X[t-1], sd = st_dev)
u = runif(1)
numerator = dunif(y)*dnorm(X[t-1],mean=y, sd = st_dev)
denominator = dunif(X[t-1])*dnorm(y,mean=X[t-1], sd = st_dev)
alpha = min(c(1,numerator/denominator))
X[t] = y*(u<= alpha) + X[t-1]*(u>alpha)
}
X
}
X = data.frame(x = simpleMCMC_unif(0.1,0,5000), t = 1:500)
ggplot(X[1:500,],aes(x=t, y=x)) + geom_point() + geom_line()
X = X[1000:5000,]
ggplot(X,aes(x=x)) + geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Let’s consider the simple simulation model \[v_\mathrm{sim} = g t,\] where \(g\) is the acceleration due to gravity and \(t\) is the time since the object was dropped from rest. The acceleration due to gravity is an uncertain parameter in our model.
Now let’s suppose that the ``real’’ model that gives us our observations is \[ v_\mathrm{obs} = 9.81(t - 0.1 \sin(\pi t)).\]
Our simulation model does not capture all the physics of the observation, but will do a decent job, especially at late times.
We have a measurment device that allows us to measure with an accuracy having a known standard deviation of \(\sigma_\mathrm{obs}=10^{-1}\) m/s.Furthermore, we assume we have no knowledge of the prior distribution for \(g\), other than it is in between 9.79 m/s\(^2\) and 9.82 m/s\(^2\). Based on this, the likelihood of a given value of \(g\), given an observation at \(t\) is \[\hat{p}(g|t) = \exp\left(-\frac{(v_\mathrm{obs} - v_\mathrm{sim})^2}{2 \sigma_\mathrm{obs}^2}\right),\] for \(g \in [9.79,9.82]\), and 0 otherwise.
To use the Metropolis algorithm to sample from the posterior distribution for \(g\) given some observational data, we will need a proposal distribution. For this we will use a proposal distribution given by \[ q(y|x) \sim N(\mu_X, 0.05).\]
The observational data we are given is at \(t = 0.1, 0.5, 1, 1.5, 2, 5, 10, 20\).
gDF <- data.frame(t = c(0.1, 0.5, 1, 1.5, 2, 5, 10, 20))
gDF$vobs= with(gDF,9.81*(t - 0.1*sin(pi*t)))
limits <- aes(ymax = gDF$vobs + .1, ymin=gDF$vobs - .1)
ggplot(gDF, aes(x=t, y = vobs)) + geom_point() + geom_errorbar(limits) + scale_x_continuous("time (s)") + scale_y_log10("Observed velocity (m/s)") + ggtitle("Observational Data")
We will start our chain at \(g=9.79\) and use 1000 burn in samples. The first thing we will do is define a function to be our \(\hat p\):
phat <- function(g){
#compute exponent for each point
exponent <- -(gDF$vobs - g*gDF$t)^2/(2 * 0.1^2)
#sum of exponent is equal to logarithm of product
phatS <- exp(sum(exponent))
if ((g > 9.82) || (g < 9.79)){ phatS = 0}
phatS
}
Our samples are generated by
#do 10000 samples
Nsamps <- 10^4
gSamps <- data.frame(i = 1:Nsamps, g=0*(1:Nsamps)+9.81)
for (i in 1:(Nsamps-1)){
proposed <- rnorm(1,mean=gSamps$g[i],sd = 0.05)
u = runif(1)
numerator = phat(proposed)*dnorm(gSamps$g[i],mean=proposed, sd = 0.05)
denominator = phat(gSamps$g[i])*dnorm(proposed,mean=gSamps$g[i], sd = 0.05)
alpha = min(c(1,numerator/denominator))
gSamps$g[i+1] = proposed*(u<= alpha) + gSamps$g[i]*(u>alpha)
}
ggplot(gSamps,aes(x=i, y = g)) + geom_line() + scale_x_continuous("sample") + scale_y_continuous(expression("g (m/s^2)")) + geom_line(aes(x=c(1000,1000), y=c(9.79,9.82)), color="red", size=1)
Looking at our distribution, subtracting out the burn-in, we get a mean of \(g=9.8111\) with a standard deviation of 0.0041. The histogram looks like
ggplot(gSamps[1001:Nsamps,],aes(x=g)) + geom_histogram(fill="white",color="blue")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: position_stack requires constant width: output may be incorrect