Statistics and Probability, TP 3

The purpose of this exercise is to help you to better interpret a p-value by using R for carrying out a permutation test, and to introduce you to some simple hypothesis testing functions. As usual, be sure to read the help documentation for any new functions.

As a reminder, your course note is based on work that you turn in from the practicals. Responses that you need to turn in are indicated by bold numbered parts.

I. t-test

Here, we look at some examples from Dalgaard's book to learn how to use the t.test function.

One-sample t-test

We use the intake data, available in the ISwR package. We will use the variable pre.

library(ISwR)
data(intake)
intake
attach(intake)
intake

Start off by looking at some simple summary statistics: mean, sd, quantile (hardly necessary for such a small data set, but good practice).

1. Might these data be approximately normally distributed? Justify your answer.

Suppose you wish to test whether there is a systematic deviation between the women's (pre) energy intake and a recommended value of 7725 kJ. Assuming that the data are from a normal distribution, we are interested in testing whether the (population) mean is 7725. We can do a t-test in R as follows:

t.test(pre, mu=7725)

There are several components to the output, Take some time to make sure you can understand what it all means.

2a. For an alpha level of 0.05, do you reject the null hypothesis? What about for an alpha level of 0.01? Explain your answers.

The default assumes that you want a 2-sided test. Use help to find out how you could get a 1-sided test for an alternative greater than the null, and carry this out.

2b. For an alpha level of 0.01, do you reject the null hypothesis?

Two-sample t-test

We use the energy data to illustrate the use of t.test for testing equality of population means based on two independent samples. Here, we wish to compare mean energy expenditure between lean and obese women.

data(energy)
?energy
attach(energy)
energy

The variable stature gives the grouping. The test can be carried out as follows:

t.test(expend ~ stature)

Check that you understand the output.

2c. For an alpha level of 0.01, do you reject the null hypothesis?

Paired t-test

Paired tests are used when there are two measurements (a 'pair') on the same individual. A paired test is essentially a one-sample test of the differences between the measurements.

We can carry out a paired t-test on the differences between pre and post from the intake data as follows:

t.test(pre, post, paired=TRUE)

Again, make sure that you know how to interpret the output. Assuming an alpha level of 0.01, what do you conclude?

It was important here to tell t.test that this was a paired test.

2d. What happens if you leave out paired=TRUE from the t.test command? Are the assumptions for a two-sample test satisfied in this situation?

II. Nonparametric Testing

Wilcoxon test

The Wilcoxon test is a nonparametric version of the t-test. It is not necessary to assume that the observations have a normal distribution. On the same data as above (all three situations), use wilcox.test to carry out the Wilcoxon test.

3. Do you come to a different conclusion for any of the situations?

Testing in Contingency Tables

Here you can practice entering table data into R and carrying out a chi-square test. In this example, we wish to study the association between caffeine consumption and marital status among women giving birth. In R, a two-way table needs to be a matrix object, so read the help for the matrix function. Input the table and look at it:

caff.marital <- matrix(c(652,1537,598,242,36,46,38,21,218,327,106,67),
nrow=3, byrow=TRUE)
colnames(caff.marital) <- c("0","1-150","151-300",">300")
rownames(caff.marital) <- c("Married","Prev. Married","Single")
caff.marital

4a. Does there appear to be any association? If so, how would you describe the association?

After you have thought about association in the data, you can get a p-value by carrying out a chi-square test:

chisq.test(caff.marital)

4b. Does the p-value agree with your thinking?

When you find a significant result, usually you would like to have an idea of the nature of the deviations. chisq.test returns some additional components that can help you to explore this. For example, you can get the observed and expected values, and contributions to the chi-square statistic as follows:

Obs <- chisq.test(caff.marital)$observed
Ex <- chisq.test(caff.marital)$expected
((Obs - Ex)^2)/Ex

5. Which categories are making large contributions to the chi-square statistic? Is there a simple way to describe the association in this table?

Your responses to the questions should be in complete sentences, and can be in either English or French. You can email your report to me (Darlene.Goldstein@epfl.ch) before Monday 11 June 2007.