I just don’t know man (Epistemic Uncertainty)

Yes, epistemic uncertainty (that is, uncertainty due to a lack of knowledge) is likely the biggest thorn in the side of a modeler/computational physicist/ computational engineer (I would make a bolder statement, but, at the risk of being redundant, I just don’t know man). One way of dealing with this is the concept of Probability Boxes. The basic idea is to treat aleatory uncertainties as distributional and epistemic as to be an interval. On the one hand, this is conservative because we have no assumptions about the underlying distribution, but the actual distribution is not likely to be completely flat.



Probability Boxes and Epistemic Uncertainty


We’re going to load in the shock breakout data from Stripling, McClarren, et al., “A calibration and data assimilation method using the Bayesian MARS emulator’’. Annals of Nuclear Energy, 52(C), 103–112.

fullData <- read.csv('CRASHBreakout.csv',na.strings = "#N/A")
simulations <- fullData[,c("thickness","laser.energy","Be.gamma","wall.opacity","flux.limiter","breakout.time")]
measurements <- fullData[1:8,c("measure.1", "measure.2", "measure.3", "thickness", "laser.energy")]

We’ll use all three measurements for our emipirical cdf.

longMeas <- melt(cbind(id=1:8,measurements[,1:3]),id.vars = "id")
longMeas <- longMeas[!is.na(longMeas$value),]
measure.cdf <- ecdf(longMeas$value)
ggplot(longMeas,aes(x=value)) + stat_ecdf() + scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF")

plot of chunk unnamed-chunk-3

Now let’s create the probability box. We’ll first generate a GPR model the simulations, as before.

emulator <- bgp(X = simulations[,1:5], Z = simulations[,"breakout.time"])
## 
## burn in:
## r=1000 d=[0.703008 1.17658 0.124188 1.40017 1.37518]; n=104
## 
## Sampling @ nn=0 pred locs:
## r=1000 d=[0.531462 0.924898 0.0684054 1.1103 1.26337]; mh=1 n=104
## r=2000 d=[0.823886 0.692313 0.0606064 0.0399567 0.622995]; mh=1 n=104
## r=3000 d=[1.15308 1.03318 0.0698691 1.18672 0.953542]; mh=1 n=104

The next step is to evaluate the cdfs for the simulations at fixed values of the \(\vec{\theta}\)’s. We’ll have to do some trickeration with data frames.

#generate theta grid
minThetas <- sapply(simulations[,3:5],min)
maxThetas <- sapply(simulations[,3:5],max)
theta.grid <- expand.grid.df(data.frame(Be.gamma=seq(minThetas["Be.gamma"],maxThetas["Be.gamma"],length.out = 5)), data.frame(wall.opacity=seq(minThetas["wall.opacity"],maxThetas["wall.opacity"],length.out = 5)), data.frame(flux.limiter=seq(minThetas["flux.limiter"],maxThetas["flux.limiter"],length.out = 5)))
theta.grid$id = 1:length(theta.grid$wall.opacity)
#join with x's
simulation.df <- cbind(measurements[1,c("thickness","laser.energy")], theta.grid,row.names=NULL)
for (i in 2:8)
  simulation.df <- rbind(simulation.df,cbind(measurements[i,c("thickness","laser.energy")], theta.grid,row.names=NULL))
simulation.preds <- predict(emulator, XX=simulation.df[,1:5])
simulation.df$breakout.time <- simulation.preds$ZZ.mean
ggplot(simulation.df,aes(x=breakout.time,color=as.factor(id))) + stat_ecdf() + theme(legend.position="none")+ scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF")

plot of chunk unnamed-chunk-5

That gives us our probability box, but we have to figure out the bounding cdfs.

#make an upper and lower ecdf
minCDF <- data.frame(breakout.time = seq(min(simulation.df$breakout.time),max(simulation.df$breakout.time),length.out = 40), CDF = 2)
maxCDF <- data.frame(breakout.time = seq(min(simulation.df$breakout.time),max(simulation.df$breakout.time),length.out = 40), CDF = -1)

for (i in 1:length(theta.grid$Be.gamma)){
  tempCDF <- ecdf(simulation.df$breakout.time[ simulation.df$id==i])
  tempvals <- tempCDF(minCDF$breakout.time)
  if (tempvals[1] > 0) print(tempvals)
  minCDF[tempvals < minCDF$CDF,"CDF"] <- tempvals[tempvals < minCDF$CDF]
  maxCDF[tempvals > maxCDF$CDF,"CDF"] <- tempvals[tempvals > maxCDF$CDF]
}
##  [1] 0.125 0.250 0.250 0.375 0.375 0.375 0.375 0.500 0.500 0.500 0.750
## [12] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [23] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [34] 1.000 1.000 1.000 1.000 1.000 1.000 1.000
prob.box <- merge(maxCDF,minCDF,by="breakout.time")
prob.box <- rename(prob.box,replace = c(CDF.x="max", CDF.y="min"))
prob.box <- melt(prob.box, id.vars = "breakout.time")
#append the actual CDF for experimental data
prob.box <- rbind(prob.box,data.frame(breakout.time = unique(prob.box$breakout.time),variable="measurement",value = measure.cdf( unique(prob.box$breakout.time))))
ggplot(prob.box,aes(x=breakout.time, y=value, color=variable)) + geom_line()+ scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF") + scale_color_discrete("Curve",labels=c("Upper Bound","Lower Bound","Measurement"))

plot of chunk unnamed-chunk-6

Therefore because the experimental data CDF is contained in the probability box, according to the validation metric, there is no evidence that the model is inconsistent with the experimental data. We see that even though we are consistent, there is a lot of slack in to tighten up the ranges of the \(\vec{\theta}\)’s. Moreover, we could have done this analysis before building the predictive model because it indicates that we’ll need to do some calibration.



Leave a Reply

Your email address will not be published. Required fields are marked *