The purpose of this project is to carry out an analysis of variance for a study of injuries in car crashes. Your job this time is to provide a report for a statistical collaborator who is working with you on the project, so you should include both mathematical formulas and data analysis results.
Stock automobiles containing dummies in the driver and front passenger seats crashed into a wall at 35 miles per hour. National Transportation Safety Board officials collected information how the crash affected the dummies. The injury variables describe the extent of head injuries and chest deceleration. The datafile also contains information on the type and safety features of each crashed car.
The main question of interest here is what effects the car features have on injury, and whether or not the features interact. One way to evaluate the relationship between car features and crash injuries is using ANOVA with an injury variable as the response. You may have to transform the values to a normal distribution (using the log, for example). You will also want to be careful to code the categorical variables as factors, even though the values may look numeric.
To download the data, go to http://www.isrec.isb-sib.ch/~darlene/data/Crash.data
The size variable abbreviations correspond to: compact (comp), heavy (hev), lightweight (lt), medium (med), minivan (mini), multi-purpose (mpv), pick-up truck (pu) and van.
Before jumping in and making anova tables, it is a good idea to look at a few data plots. Read the data into R (below I have called the R data set crash). What happens if you type
plot(crash)
How many levels does each factor variable have? Because the variables are factors taking on only a small number of values, other plots are more useful here. Try looking at mean value of each injury type for each of the factors and, to check whether there appear to be distributional problems with the data, also look at median weight for each factor:
?plot.design
Do the data appear to be very skewed, or have many outliers?
Another useful way to look at the data is to make boxplots of injury values for Head for each factor; e.g., (you will of course put titles on your plots!):
attach(crash) par(mfrow=c(2,2))
# to set up the plotting region
boxplot(Head ~ Protection)
boxplot(Head ~ Doors)
boxplot(Head ~ Year)
boxplot(Head ~ Type)
Does the variability of observations for different levels within a factor look similar? What about the averages?
You will need to carry out a thorough preliminary analysis of the data to determine which factors should remain in a final model explaining each injury type. The function aov or lm will be useful for fitting the model. Say your anova is crash.aov (or crash.lm), you can get a summary with
summary(crash.aov)
Which interactions appear to be significant? Which main effects? For each interaction, look at an interaction plot (?interaction.plot).
How can you interpret these plots?
What would you suggest as a final model? To explore this, the function stepAIC from the MASS package should be useful.
In your report, include results from any relevant analyses as well as the final model explaining Head, along with any recommendations you can give based on your results.
Your report can be in English or French. Please include .tex and R command files (or .Rnw file if you have used one) and .pdf file. Please follow the naming convention: surname3.tex, etc. (e.g. my files would be goldstein3.Rnw, goldstein3.pdf). For full re-write privileges, please email your report files to me (darlene.goldstein at epfl.ch) by Monday 15 November 2010.