Cómo generar variables aleatorias de Bernoulli con el común de correlación $\rho$?

Question

Cómo generar variables aleatorias de Bernoulli con el común de correlación $\rho$?

Preguntado el 14 de Diciembre, 2017: Cuando se hizo la pregunta
157 visitas: Cuantas visitas ha tenido la pregunta
1 Respuestas: Cuantas respuestas ha tenido la pregunta
Abierta: Estado actual de la pregunta

Supongamos que queremos generar $X_1, \ldots, X_n$ Bernoulli variables aleatorias tales que:

$$ Corr(X_i, X_j) = \rho $$

para todos los $i \neq j$.

Me pregunto ¿cuál es el método que yo podría ser capaz de utilizar (voy a estar tratando de hacer esto en R). Estoy pensando en usar Couplas, pero alguien podría guiarme cómo implementar esto? Gracias.

Preguntado el 14 de Diciembre, 2017 por user321627

Answer 1

1 Respuestas

Answer 2

14voto

jldugger Puntos 7490

Debido a esta matriz de correlación es tan simétrica, podemos tratar de resolver el problema con una distribución simétrica.

Uno de los más simples que nos da la suficiente flexibilidad en la variación de la correlación es la siguiente. Dado $d\ge 2$ variables, definir una distribución en el conjunto de la $d$-dimensiones vectores binarios $X$ mediante la asignación de probabilidad de $q$$X=(1,1,\ldots, 1)$, la probabilidad de $q$$X=(0,0,\ldots, 0)$, y la distribución de los restantes probabilidad de $1-2q$ a partes iguales entre los $d$ vectores que tengan exactamente un $1$; por lo tanto, cada uno de esos obtiene la probabilidad de $(1-2q)/d$. Tenga en cuenta que esta familia de distribuciones depende de un parámetro $0\le q\le 1/2$.

Es fácil simular de una de estas distribuciones:la salida de un vector de ceros con una probabilidad de $q$, la salida de un vector de aquellos con una probabilidad de $q$, y en caso contrario, seleccione uniformemente al azar de las columnas de la $d\times d$ matriz identidad.

Todos los componentes de $X$ son idénticamente distribuidas variables de Bernoulli. Todos ellos tienen en común parámetro

$$p = \Pr(X_1 = 1) = q + \frac{1-2q}{d}.$$

Compute the covariance of $X_i$ and $X_j$ by observing they can both equal $1$ only when all the components are $1$, whence

$$\Pr(X_i=1=X_j) = \Pr(X=(1,1,\ldots,1)) = q.$$

This determines the mutual correlation as

$$\rho = \frac{d^2q - ((d-2)q + 1)^2}{(1 + (d-2)q)(d-1 - (d-2)q)}.$$

Given $d \ge 2$ and $-1/(d-1)\le \rho \le 1$ (which is the range of all possible correlations of any $d$-variate random variable), there is a unique solution $q(\rho)$ between $0$ and $1/2$.

Simulations bear this out. Beginning with a set of $21$ equally-spaced values of $\rho$, the corresponding values of $q$ were computed (for the case $d=8$) and used to generate $10,000$ independent values of $X$. The $\binom{8}{2}=28$ correlation coefficients were computed and plotted on the vertical axis. The agreement is good.

I carried out a range of such simulations for values of $d$ between $2$ and $99$, with comparably good results.

A generalization of this approach (namely, allowing for two, or three, or ... values of the $X_i$ simultaneously to equal $1$) would give greater flexibility in varying $E[X_i]$, which in this solution is determined by $\rho$. That combines the ideas related here with the ones in the fully general $d=2$ solution described at https://stats.stackexchange.com/a/285008/919.

The following R code features a function p to compute $q$ from $\rho$ and $d$ y exposiciones de una manera bastante eficaz de simulación mecanismo dentro de su bucle principal.

#
# Determine p(All zeros) = p(All ones) from rho and d.
#
p <- function(rho, d) {
  if (rho==1) return(1/2)
  if (rho <= -1/(d-1)) return(0)
  if (d==2) return((1+rho)/4)
  b <- d-2
  (4 + 2*b + b^2*(1-rho) - (b+2)*sqrt(4 + b^2 * (1-rho)^2)) / (2 * b^2 * (1-rho))
}
#
# Simulate a range of correlations `rho`.
#
d <- 8       # The number of variables.
n.sim <- 1e4 # The number of draws of X in the simulation.

rho.limits <- c(-1/(d-1), 1)
rho <- seq(rho.limits[1], rho.limits[2], length.out=21)
rho.hat <- sapply(rho, function(rho) {
  #
  # Compute the probabilities from rho.
  #
  qd <- q0 <- p(rho, d)
  q1 <- (1 - q0 - qd)
  #
  # First randomly select three kinds of events: all zero, one 1, all ones.
  #
  u <- sample.int(3, n.sim, prob=c(q0,q1,qd), replace=TRUE)
  #
  # Conditionally, when there is to be one 1, uniformly select which 
  # component will equal 1.
  #
  k <- diag(d)[, sample.int(d, n.sim, replace=TRUE)]
  #
  # When there are to be all zeros or all ones, make it so.
  #
  k[, u==1] <- 0
  k[, u==3] <- 1
  #
  # The simulated values of X are the columns of `k`. Return all d*(d-1)/2 correlations.
  #
  cor(t(k))[lower.tri(diag(d))]
})
#
# Display the simulation results.
#
plot(rho, rho, type="n",
     xlab="Intended Correlation",
     ylab="Sample Correlation", 
     xlim=rho.limits, ylim=rho.limits,
     main=paste(d, "Variables,", n.sim, "Iterations"))

abline(0, 1, col="Red", lwd=2)

invisible(apply(rho.hat, 1, function(y) 
  points(rho, y, pch=21, col="#00000010", bg="#00000004")))

Respondido el 14 de Diciembre, 2017 por jldugger (7490 Puntos )

Cómo generar variables aleatorias de Bernoulli con el común de correlación $\rho$?

Respuesta

Preguntas Destacadas

Etiquetas mas usadas

i-Ciencias.com

Powered by:

Cómo generar variables aleatorias de Bernoulli con el común de correlación $\rho$?

Respuesta

Preguntas relacionadas

Preguntas Destacadas

Etiquetas mas usadas

En nuestra red

i-Ciencias.com

Powered by: