Yes, epistemic uncertainty (that is, uncertainty due to a lack of knowledge) is likely the biggest thorn in the side of a modeler/computational physicist/ computational engineer (I would make a bolder statement, but, at the risk of being redundant, I just don’t know man). One way of dealing with this is the concept of Probability Boxes. The basic idea is to treat aleatory uncertainties as distributional and epistemic as to be an interval. On the one hand, this is conservative because we have no assumptions about the underlying distribution, but the
Probability Boxes and Epistemic Uncertainty
Ryan McClarren
November 20, 2014
We’re going to load in the shock breakout data from Stripling, McClarren, et al., “A calibration and data assimilation method using the Bayesian MARS emulator’’. Annals of Nuclear Energy, 52(C), 103–112.
fullData <- read.csv('CRASHBreakout.csv',na.strings = "#N/A")
simulations <- fullData[,c("thickness","laser.energy","Be.gamma","wall.opacity","flux.limiter","breakout.time")]
measurements <- fullData[1:8,c("measure.1", "measure.2", "measure.3", "thickness", "laser.energy")]
We’ll use all three measurements for our emipirical cdf.
longMeas <- melt(cbind(id=1:8,measurements[,1:3]),id.vars = "id")
longMeas <- longMeas[!is.na(longMeas$value),]
measure.cdf <- ecdf(longMeas$value)
ggplot(longMeas,aes(x=value)) + stat_ecdf() + scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF")
Now let’s create the probability box. We’ll first generate a GPR model the simulations, as before.
emulator <- bgp(X = simulations[,1:5], Z = simulations[,"breakout.time"])
##
## burn in:
## r=1000 d=[0.703008 1.17658 0.124188 1.40017 1.37518]; n=104
##
## Sampling @ nn=0 pred locs:
## r=1000 d=[0.531462 0.924898 0.0684054 1.1103 1.26337]; mh=1 n=104
## r=2000 d=[0.823886 0.692313 0.0606064 0.0399567 0.622995]; mh=1 n=104
## r=3000 d=[1.15308 1.03318 0.0698691 1.18672 0.953542]; mh=1 n=104
The next step is to evaluate the cdfs for the simulations at fixed values of the \(\vec{\theta}\)’s. We’ll have to do some trickeration with data frames.
#generate theta grid
minThetas <- sapply(simulations[,3:5],min)
maxThetas <- sapply(simulations[,3:5],max)
theta.grid <- expand.grid.df(data.frame(Be.gamma=seq(minThetas["Be.gamma"],maxThetas["Be.gamma"],length.out = 5)), data.frame(wall.opacity=seq(minThetas["wall.opacity"],maxThetas["wall.opacity"],length.out = 5)), data.frame(flux.limiter=seq(minThetas["flux.limiter"],maxThetas["flux.limiter"],length.out = 5)))
theta.grid$id = 1:length(theta.grid$wall.opacity)
#join with x's
simulation.df <- cbind(measurements[1,c("thickness","laser.energy")], theta.grid,row.names=NULL)
for (i in 2:8)
simulation.df <- rbind(simulation.df,cbind(measurements[i,c("thickness","laser.energy")], theta.grid,row.names=NULL))
simulation.preds <- predict(emulator, XX=simulation.df[,1:5])
simulation.df$breakout.time <- simulation.preds$ZZ.mean
ggplot(simulation.df,aes(x=breakout.time,color=as.factor(id))) + stat_ecdf() + theme(legend.position="none")+ scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF")
That gives us our probability box, but we have to figure out the bounding cdfs.
#make an upper and lower ecdf
minCDF <- data.frame(breakout.time = seq(min(simulation.df$breakout.time),max(simulation.df$breakout.time),length.out = 40), CDF = 2)
maxCDF <- data.frame(breakout.time = seq(min(simulation.df$breakout.time),max(simulation.df$breakout.time),length.out = 40), CDF = -1)
for (i in 1:length(theta.grid$Be.gamma)){
tempCDF <- ecdf(simulation.df$breakout.time[ simulation.df$id==i])
tempvals <- tempCDF(minCDF$breakout.time)
if (tempvals[1] > 0) print(tempvals)
minCDF[tempvals < minCDF$CDF,"CDF"] <- tempvals[tempvals < minCDF$CDF]
maxCDF[tempvals > maxCDF$CDF,"CDF"] <- tempvals[tempvals > maxCDF$CDF]
}
## [1] 0.125 0.250 0.250 0.375 0.375 0.375 0.375 0.500 0.500 0.500 0.750
## [12] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [23] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [34] 1.000 1.000 1.000 1.000 1.000 1.000 1.000
prob.box <- merge(maxCDF,minCDF,by="breakout.time")
prob.box <- rename(prob.box,replace = c(CDF.x="max", CDF.y="min"))
prob.box <- melt(prob.box, id.vars = "breakout.time")
#append the actual CDF for experimental data
prob.box <- rbind(prob.box,data.frame(breakout.time = unique(prob.box$breakout.time),variable="measurement",value = measure.cdf( unique(prob.box$breakout.time))))
ggplot(prob.box,aes(x=breakout.time, y=value, color=variable)) + geom_line()+ scale_x_continuous("Breakout time (ps)") + scale_y_continuous("CDF") + scale_color_discrete("Curve",labels=c("Upper Bound","Lower Bound","Measurement"))
Therefore because the experimental data CDF is contained in the probability box, according to the validation metric, there is no evidence that the model is inconsistent with the experimental data. We see that even though we are consistent, there is a lot of slack in to tighten up the ranges of the \(\vec{\theta}\)’s. Moreover, we could have done this analysis
// add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); });