No estoy seguro de si esto responde tu pregunta, pero a continuación encontrará algunos R código que sigue a Bender et al. (2005). Se describe un método para simular un Cox PH modelo de regresión con propiedades como la proporción de eventos censurados (ver la línea "dat <- data.frame(T = T, X, event = rbinom(n, 1, 0.30))
", es decir, el 70% de todos los eventos son censurados).
Referencia
Bender, Ralf, Thomas Augustin, und María Blettner. 2005. La generación de los tiempos de supervivencia para simular modelos de riesgos proporcionales de Cox. La estadística en Medicina 24: 1713-1723.
##' Generate survival data with $p$ (correlated) predictors
##'
##'
##'
##' @title Generate survival data
##' @param n Sample size
##' @param beta Vector of coefficients
##' @param r Correlation between predictors
##' @param id.iter
##' @param id.study
##' @return matrix with identification variables id.iter and id.study,
##' T (survival time), event (0: censored),
##' predictors X1 to X$p$
##' @author Bernd Weiss
##' @references Bender et al. (2005)
genSurvData <- function(n = 100000,
beta = c(0.8, 2.2, -0.5, 1.1, -1.4),
r = 0.1,
id.iter = NA,
id.study = NA){
## Scale parameter (the smaller lambda, the greater becomes T)
lambda <- 0.000001#1.7
## Shape parameter
nue <- 8.9#9.4
## Sample size
n <- n
## Number of predictors
p <- length(beta)
## Generate column vector of coefficients
beta <- matrix(beta, ncol = 1)
## Generate correlated covariate vectors using a multivariate normal
## distribution with X ~ N(mu, S) and a given correlation matrix R, with:
## R: A p x p correlation matrix
## mu: Vector of means
## SD: Vector of standard deviations
## S: Variance-covariance matrix
R <- matrix(c(rep(r, p^2)), ncol = p)
diag(R) <- 1
R
mu <- rep(0, p)
SD <- rep(1, p)
S <- R * (SD %*% t(SD))
X <- mvrnorm(n, mu, S)
cov(X)
cor(X)
sqrt(diag(cov(X)))
## Calculate survival times
T <- (-log(runif(n)) / (lambda * exp(X %*% beta)))^(1/nue)
## 30% (0.30) of all marriages are getting divorced, i.e. 70% of all
## observations are censored ("event = rbinom(n, 1, 0.30)")
dat <- data.frame(T = T, X, event = rbinom(n, 1, 0.30))
## Also, all T's > 30 yrs are by definition censored and T is set to 30 yrs
dat$event <- ifelse(dat$T >= 30, 0, dat$event)
dat$T <- ifelse(dat$T >= 30, 30, dat$T)
dat$id.iter <- id.iter
dat$id.study <- id.study
## Reorder data frame: T, event, covariates
tmp.names <- names(dat)
dat <- dat[, c("id.iter", "id.study", "T", "event", tmp.names[grep("X", tmp.names)])]
## Returning a matrix speeds-up things a lot... lesson learned.
dat <- as.matrix(dat)
return(dat)
}
library(survival)
library(MASS)
dat <- genSurvData(n = 1000)
dat <- as.data.frame(dat)
survfit(Surv(time = T, event = event) ~ 1, data = dat)
coxph(Surv(time = T, event = event) ~ X1 + X2 + X3 +X4 +X5, data = dat)