Yo estaba tratando de responder a la pregunta Evaluar la integral con la Importancia método de muestreo en la R. Básicamente, el usuario debe calcular
$$\int_{0}^{\pi}f(x)dx=\int_{0}^{\pi}\frac{1}{\cos(x)^2+x^2}dx$$
using the exponential distribution as the importance distribution
$$q(x)=\lambda\ \exp^{-\lambda x}$$
and find the value of $\lambda$ which gives the better approximation to the integral (it's self-study
). I recast the problem as the evaluation of the mean value $\mu$ of $f(x)$ over $[0,\pi]$: the integral is then just $\pi\mu$.
Thus, let $p(x)$ be the pdf of $X\sim\mathcal{U}(0,\pi)$, and let $Y\sim f(X)$: the goal now is to estimate
$$\mu=\mathbb{E}[Y]=\mathbb{E}[f(X)]=\int_{\mathbb{R}}f(x)p(x)dx=\int_{0}^{\pi}\frac{1}{\cos(x)^2+x^2}\frac{1}{\pi}dx$$
using importance sampling. I performed a simulation in R:
# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)
# function to be integrated
f <- function(x){
1 / (cos(x)^2+x^2)
}
# importance sampling
importance.sampling <- function(lambda, f, B){
x <- rexp(B, lambda)
f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}
# mean value of f
mu.num <- integrate(f,0,pi)$value/pi
# initialize code
means <- 0
sigmas <- 0
error <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE
# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(20,N)
# set the sample size for importance sampling
B <- 10^4
# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence
# interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
I <- importance.sampling(i, f, B)
j <- j + 1
mu <- mean(I)
std <- sd(I)
lower.CB <- mu - 1.96*std/sqrt(B)
upper.CB <- mu + 1.96*std/sqrt(B)
means[j] <- mu
sigmas[j] <- std
error[j] <- abs(mu-mu.num)
CI.min[j] <- lower.CB
CI.max[j] <- upper.CB
CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}
# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)
# so, what's the coverage?
mean(CI.covers.parameter)
# [1] 0.19
The code is basically a straightforward implementation of importance sampling, following the notation used here. The importance sampling is then repeated $N$ times to get multiple estimates of $\mu$, and each time a checks is made on whether the 95% interval covers the actual mean or not.
As you can see, for $\lambda=20$ the actual coverage is just 0.19. And increasing $B$ to values such as $10^6$ no ayuda (la cobertura es aún menor, 0.15). ¿Por qué está sucediendo esto?