Survival Analysis

The purpose of this short TP is to get practice on survival analysis using R. As usual, make sure that you read the help for any new functions that you use.


Melanoma Data

Load the ISwR and survival packages into R and then load the melanom data and start to explore it using the numerical and graphical summaries you have learned about this week (e.g. functions like summary, hist, pairs, etc.). Also be sure to look at the help for melanom.

library(ISwR)
library(survival)
data(melanom)
attach(melanom)
names(melanom)
?melanom

Create a Surv object

To carry out survival analysis, we need to create a Surv object. The values 2 and 3 for the status are to be considered as censored:

?Surv
mel.surv <- Surv(days, status==1)

Kaplan-Meier estimate

We can get the Kaplan-Meier estimate of the survival function using survfit:

surv.all <- survfit(mel.surv)
summary(surv.all)

The result of summary only gives estimates for event times. The censoring times can also be shown if you use the censored argument:

summary(surv.all,censored=TRUE)

Usually it's more interesting to look at a plot rather than the numerical values:

plot(surv.all)

The short vertical lines on the curve show where censoring has occurred, and the dashed bands around the curve give approximate confidence intervals.

We can look at the curves separately for each gender (colored differently for males and females):

surv.sex <- survfit(mel.surv ~ sex)
plot(surv.sex, col=c("red","blue"))

Make sure you can tell which color corresponds to which gender. Does one group appear to have longer survival than the other? We can carry out the log-rank test to test whether the population curves are the same using the survdiff function:

survdiff(Surv(days, status==1) ~ sex)

Is the observed difference statistically significant?

Cox modeling

Carrying out Cox modeling and understanding the output you get is beyond the scope of the course, but here is a brief summary. Cox modeling is carried out in a similar manner to regression modeling you have already done with lm, but with linearity assumed on the log hazard scale. In R, you use the function coxph with a formula including the variables that you want to include in the model.