Applied Statistics - Logistic regression project

The purpose of this project is to create a model predicting low birth weight in babies. Your job this time is to provide a report for the chief statistical collaborator, so you do not need to provide elementary explanations of the procedures you perform but you should include mathematical formulas with your data analysis and final model along with a clear explanation and interpretation of your results.

The data are available in the MASS package in R:

library(MASS)
attach(birthwt)
?birthwt

You should follow the example in the help page on birthwt for creating factors. As usual, you should carry out exploratory analyses on the data. Then, use logistic regression to model how the probability of a low birth weight baby depends on some of the other variables in the data set. (Although the actual birth weights are available, you will concentrate here on predicting if the birth weight is low from the remaining variables.) You will need the glm function in R to fit the model(s), and you can look at the output using summary. You can also use the anova function for carrying out the analysis of deviance (for comparing nested models), and/or stepAIC for model selection.

In your report, include results from any relevant analyses as well as the final model with an interpretation and any recommendations you can give based on your results.

Your report can be in English or French. Please include your .Rnw and .pdf file and a .bib file if you cite any references. Please follow the naming convention: surname4.Rnw, etc. (e.g. my files would be goldstein4.Rnw, goldstein4.pdf, goldstein4.bib). For full re-write privileges, please email your report files to me (Darlene.Goldstein@epfl.ch) before Friday 3 December 2010.