
Hypothesis testing
Often when we analyze data, we would like to know whether the mean of our sample distribution is different from some theoretical value or expected average. Suppose we measured the height of 12 females and wanted to know if the average we calculated from our sample population is significantly different from the theoretical average height of females, which is 171 cm. A simple test we could perform to test this hypothesis would be the Wilcoxon signed-rank test. To do this in R, we will use the wilcox.test()
function with the mu
argument set to 171
:
> female.heights <- c(117, 162, 143, 120, 183, 175, 147, 145, 165, 167, 179, 116) > mean(females.heights) [1] 151.5833 > wilcox.test(female.heights, mu=171) Wilcoxon signed rank test with continuity correction data: female.heights V = 11.5, p-value = 0.0341 alternative hypothesis: true location is not equal to 171 Warning message: In wilcox.test.default(female.heights, mu = 171) : cannot compute exact p-value with ties
Because the p-value is less than 0.05, we accept the alternative hypothesis that the mean of our sample set is not equal to 171. You will also notice in the output that R produces a warning message stating that the exact p-value cannot be computed with ties in the data. This is because there are repeats of the same measurement in our dataset, which cause ties in the data. When there are ties in the data, the wilcox.test()
function will return a p-value
and, if specified, confidence interval estimates based on a normal approximation to the signed-rank statistics. Let's have a look at the following function:
> wilcox.test(female.heights, mu=171, conf.int = TRUE, conf.level = 0.99) Wilcoxon signed rank test with continuity correction data: female.heights V = 11.5, p-value = 0.0341 alternative hypothesis: true location is not equal to 171 99 percent confidence interval: 120 175 sample estimates: (pseudo)median 151.9893 Warning messages: 1: In wilcox.test.default(female.heights, mu = 171, conf.int = TRUE, : cannot compute exact p-value with ties 2: In wilcox.test.default(female.heights, mu = 171, conf.int = TRUE, : cannot compute exact confidence interval with ties
Instead of using the Wilcoxon signed-rank test with continuity correction, we can test if the mean of our sample distribution is different from the theoretical average of 171 using a bootstrap approach that would estimate the mean from taking the mean for a large number of random samples (in this case 10,000), each with a size of 12 and values part of our female.heights
dataset. To allow R to generate these samples, we will use the sample()
function and specify replacement=TRUE
so that repetitions are allowed and we don't always end up with all twelve values from our sample dataset.
Here is the code you would use in R to apply the bootstrap approach to test whether the mean of our sample population is significantly different from the theoretical average.
First, we create a numeric vector to store the 10,000 means generated from sampling the female.heights
dataset, as follows:
> f <- numeric(10000) > for(i in 1:10000) { f[i] <- mean(sample(female.heights, replace=TRUE)) } > hist(f, xlab="bootstrap means")
The result is shown in the following plot:

Now, we can perform a t-test to determine whether the mean of our sampling distribution is different for the theoretical value using the t.test()
function with the mu
argument set to 171
, as follows:
> t.test(female.heights, mu=171) One Sample t-test data: female.heights t = -2.7861, df = 11, p-value = 0.01771 alternative hypothesis: true mean is not equal to 171 95 percent confidence interval: 136.2446 166.9221 sample estimates: mean of x 151.5833
Since the p-value
is less than 0.05, we can accept the alternative hypothesis and conclude that the female heights in our dataset are significantly different from the theoretical average.
Instead of comparing the mean of two sample distributions, you might want to compare their variance by performing an F-test. You can perform this test in R using the var.test()
function, as follows:
> var.test(probeA, probeB) F test to compare two variances data: probeA and probeB F = 0.2301, num df = 41, denom df = 41, p-value = 7.182e-06 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1236921 0.4281015 sample estimates: ratio of variances 0.2301147
Another useful statistical test is the Kruskal-Wallis test used to determine whether two sample populations are identical without assuming that they follow a normal distribution. For example, if we go back to our data frame containing the gene expression values for probeA
and probeB
and their response to the drugA
treatment (df
), we can determine if the mean expression value for probeA
is dependent on the response to the drugA
treatment using the kruskal.test()
function. Let's have a look at this function:
> head(df) expr_probeA expr_probeB drugA_response GSM487973 4.372242 8.176033 success GSM487972 4.293080 8.541016 fail GSM487971 4.210850 8.338475 success GSM487970 4.707231 7.935021 fail GSM487969 4.345101 7.868985 success GSM487968 4.586062 7.909702 fail > kruskal.test(expr_probeA ~ drugA_response, data = df) Kruskal-Wallis rank sum test data: expr_probeA by drugA_response Kruskal-Wallis chi-squared = 0.0571, df = 1, p-value = 0.8111
Since the p-value
is greater than 0.05, we cannot reject the null hypothesis and conclude that the probeA
expression values in response to the drugA
treatments are indeed from the same population.
Proportion tests
Sometimes we want to evaluate proportions instead of individual measurements. Several statistical tests are used to determine whether the proportion observed in your dataset is significantly different from some theoretical value. Two useful statistical tests used in hypothesis testing for proportions are the traditional Z-test and the binomial exact test. Say we wanted to test that 50 percent of patients have to wait more than 4 hours to see a doctor. We can record the waiting period for 11 patients and store that information in a numerical vector as follows:
> waiting.period <- c(3, 5, 4, 5.5, 3.5, 2.5, 3, 5, 4.5, 3, 3.5)
Next, we can count the number of patients who waited for more than 4 hours by creating a table using the following commands:
> above4.hrs <- ifelse(waiting.period > 4, "yes", "no") > above4hs.table <- table(above4.hrs) > above4hs.table above4.hrs no yes 7 4
Now, if we can assume the waiting period to see a doctor is indeed 50 percent, then we can use a Z-test to this hypothesis using the prop.test()
function:
> prop.test(4, n=11, p=0.5, alternative="two.sided", correct=FALSE) 1-sample proportions test without continuity correction data: 4 out of 11, null probability 0.5 X-squared = 0.8182, df = 1, p-value = 0.3657 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.1516647 0.6461988 sample estimates: p 0.3636364
By explicitly stating correct=FALSE
in the prop.test()
function, we are ensuring that no continuity correction is applied when determining the Z value.
Since the sample size of our female.heights
dataset is small (n=11
), it might be a better idea to use "continuity-adjusted" Z value when determining whether to accept or reject the null hypothesis by setting the correct
argument to TRUE
.
> prop.test(4, n=11, p=0.5, alternative="two.sided", correct=TRUE) 1-sample proportions test with continuity correction data: 4 out of 11, null probability 0.5 X-squared = 0.3636, df = 1, p-value = 0.5465 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.1236508 0.6838637 sample estimates: p 0.3636364
From the statistical test, we can accept the null hypothesis and conclude that we don't have sufficient evidence to say that 50 percent of patients wait more than four hours to see a doctor. In fact, from the 95 percent confidence interval, we can see that the proportion of patients that will likely wait for more than four hours to see a doctor is potentially between 0.12 and 0.68.
Instead of using the continuity-adjusted Z-statistic to choose whether to accept or reject the null hypothesis, we can also perform a binomial exact test with the binom.test()
function, as follows:
> binom.test(4, n=11, p=0.5) Exact binomial test data: 4 and 11 number of successes = 4, number of trials = 11, p-value = 0.5488 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.1092634 0.6920953 sample estimates: probability of success 0.3636364
Similarly, the binomial exact test indicates that we don't have enough evidence to reject the null hypothesis and that the proportion of patients that wait for more than four hours to see a doctor likely falls between 11 to 69 percent.
Two sample hypothesis tests
So far, we have covered how to use statistical tests for hypothesis testing on a single sample distribution. However, these statistical tests can also be applied to compare two sample distributions. For example, we could test whether the mean of the probeA
sample distribution was significantly different from the probeB
sample distribution. If we can assume that the error of both samples are normally distributed, then we can use a t-test to compare the mean of both samples by entering t.test(probeA, probeB)
, as follows:
> t.test(probeA, probeB) Welch Two Sample t-test data: probeA and probeB t = -35.8398, df = 58.92, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.486162 -3.117460 sample estimates: mean of x mean of y 4.773866 8.075677
If the errors are not normally distributed for these samples, we can run the Wilcoxon rank-sum test, as follows:
> wilcox.test(probeA, probeB) Wilcoxon rank sum test data: probeA and probeB W = 0, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0
If these samples are paired, you can run a paired t-test or a Wilcoxon signed-rank test by setting the paired
argument to TRUE
:
> wilcox.test(probeA, probeB, paired=T) Wilcoxon signed rank test data: probeA and probeB V = 0, p-value = 4.547e-13 alternative hypothesis: true location shift is not equal to 0
Instead of comparing the mean of two populations, you might be interested in comparing two proportions by performing a binomial test using the prop.test()
function. Suppose you want to test the hypothesis that women are favored over men for promotions in company A. If there are 16 out of 430 female employees and 63 out of 1,053 male employees promoted this year, you could perform a binomial proportions test using the prop.test()
function with two vectors. The first vector will contain the number of female employees and male employees promoted this year and the other vector will store the total number of females and males employed by company A. Let's have a look at the following code:
> promoted.employees <-c(16, 63) > total.employees <- c(430, 1053) > prop.test(promoted.employees, total.employees) 2-sample test for equality of proportions with continuity correction data: promoted.employees out of total.employees X-squared = 2.6653, df = 1, p-value = 0.1026 alternative hypothesis: two.sided 95 percent confidence interval: -0.04717571 0.00193619 sample estimates: prop 1 prop 2 0.03720930 0.05982906
You can test for independence in contingency tables using the Chi-squared or Fisher's Exact test in R, which uses the data stored in a matrix with the chisq.test()
and fisher.test()
functions. For example, we can create a matrix that stores the counts observed for blue eyes, brown eyes, blond hair, and brown hair in a sample population of 96 Caucasian males. To test whether eye color and hair color are independent, we can perform a Chi-squared test with the chisq.test()
function, as follows:
> trait.counts <- matrix(c(24, 14, 11, 47), nrow=2) > colnames(trait.counts) <- c("Blue eyes", "Brown eyes") > rownames(trait.counts) <- c("Blond hair", "Dark brown hair") > trait.counts Blue eyes Brown eyes Blond hair 24 11 Dark brown hair 14 47 > chisq.test(trait.counts) Pearson's Chi-squared test with Yates' continuity correction data: trait.counts X-squared = 17.4938, df = 1, p-value = 2.882e-05
As you can see, by default, Yates' continuity correction
is applied. To remove Yates' continuity correction
, you just have to specify correct=FALSE
as follows:
> chisq.test(trait.counts, correct=FALSE) Pearson's Chi-squared test data: trait.counts X-squared = 19.3544, df = 1, p-value = 1.086e-05
Since the p-value
is less than 0.05, we can reject the null hypothesis that eye color and hair color are independent traits for this sample population.
Instead of a Chi-squared test, you may be interested in performing a Fisher's exact test. This statistical test is usually performed for the analysis of contingency tables in which one or more of the expected frequencies are less than five. To illustrate the fisher.test()
function, let's create a 2 x 2 matrix containing hypothetical data, as follows:
> data.counts <- matrix(c(7, 5, 2, 6), nrow=2) > fisher.test(data.counts) Fisher's Exact Test for Count Data data: data.counts p-value = 0.1968 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.438792 55.616983 sample estimates: odds ratio 3.895711
Unit root tests
R has a variety of packages that allow you to perform a variety of statistical tests. For example, the tseries
package allows you to perform unit root tests to determine whether a time series variable is nonstationary using an autoregressive model. For example, the Augmented Dickey–Fuller (ADF) is a test for a unit root in a time series sample. The null hypothesis of this test is that the time series is not stationary. The ADF test statistics is a negative number and the more negative that number, the more strongly we can reject the null hypothesis and accept the alternative hypothesis that there is a unit root in the time series.
For example, let's create a time series representing the fertility rate of women between the age of 35 to 39 and 40 to 44 between 1997 and 2012 for residents of New York City using the annual vital statistic tables provided by the Department of Health at http://www.health.ny.gov/statistics/vital_statistics/. The fertility rates provided in these tables are based on the number of live births per 1,000 females between the age of 35 to 39 and 40 to 44. A summary of this data is provided in the fertility Data.xlsx
file, which can be downloaded at (https://www.packtpub.com/books/content/support/19729/fertilityData.xlsx). Save this file to your current working directory, as shown:
>setwd("/directory_where_file_was_downloaded/")
Next, load the data into your workspace using the read.xls()
function, which is available as part of the gdata
package:
> library("gdata") > fertility_rates.df <- read.xls("fertilityData.xlsx") # Change the colnames for the two time series accordingly. > colnames(fertility_rates.df)[2:3] <- c("Age 35-39", "Age 40-44")
We can take a look at the data frame using the head()
function:
> head(fertility_rates.df) year Age 35-39 Age 40-44 1 1997 46.4 12.4 2 1998 48.1 12.8 3 1999 49.7 12.3 4 2000 49.8 13.5 5 2001 49.2 13.2 6 2002 49.4 13.5
Next, we need to create a time series object for the fertility rate of each age group using the ts()
function. Basically, the ts()
function allows us to specify the times of the first and last observation together with the frequency of that occurrence with the start
, end
, and frequency
arguments, respectively. The frequency is the number of observations per unit time, where 1 is equal to annual, 4 is equal to quarterly, and 12 is equal to monthly. To create a time series from our data frame, we will remove the year column and specify the frequency as annual with frequency=1
, as shown in the following code:
> fertilityRates.ts <- ts(fertility_rates.df[, 2:3], start=c(1997, 1), end=c(2012, 1), frequency=1)
To inspect our time series object, we can print it, as follows:
> fertilityRates.ts Time Series: Start = 1997 End = 2012 Frequency = 1 Age 35-39 Age 40-44 1997 46.4 12.4 1998 48.1 12.8 1999 49.7 12.3 2000 49.8 13.5 2001 49.2 13.2 2002 49.4 13.5 2003 51.6 14.8 2004 52.5 14.2 2005 52.0 14.3 2006 53.5 14.5 2007 53.6 15.0 2008 54.5 15.0 2009 58.2 17.1 2010 63.6 18.2 2011 65.2 18.0 2012 66.9 19.1
We can also plot the time series with the plot()
function, which calls the plot.ts()
function in R to create a graph of time series objects, as shown in the following graph:
> plot(fertilityRates.ts, main="Fertility Rates for Females in NYC from 1997 to 2012", xlab="Year")

Now, let's use the ADF test to check if our time series is stationary in both female age groups using the adf.test()
function as part of the tseries
package. Since we can only apply the function on univariate time series, we will evaluate each of them separately. Let's have a look at the following function:
> library("tseries")
For the time series for women aged 35 to 39, we use the following code:
> adf.test(fertilityRates.ts[, 1]) Augmented Dickey-Fuller Test data: fertilityRates.ts[, 1] Dickey-Fuller = 0.3567, Lag order = 2, p-value = 0.99 alternative hypothesis: stationary Warning message: In adf.test(fertilityRates.ts[, 1]) : p-value greater than printed p-value
For the time series for women aged from 40 to 44, we use the following code:
> adf.test(fertilityRates.ts[, 2]) Augmented Dickey-Fuller Test data: fertilityRates.ts[, 2] Dickey-Fuller = -0.3712, Lag order = 2, p-value = 0.9805 alternative hypothesis: stationary
In both cases, we cannot reject the null hypothesis. Therefore, we conclude that both time series are not stationary. Indeed, we can see from the graphs that the fertility rate for both age groups increased in the early 2000s. Now, what if we are only interested in the fertility rate of women aged from 40 to 44 between 1997 and 2003? We can test if this time series is stationary, as follows:
> testTS <- ts(fertility_rates.df[1:7, 2], start=c(1997, 1), frequency=1) > adf.test(testTS) Augmented Dickey-Fuller Test data: testTS Dickey-Fuller = -7.4101, Lag order = 1, p-value = 0.01 alternative hypothesis: stationary Warning message: In adf.test(testTS) : p-value smaller than printed p-value
In this case, we have a p-value
, which is much smaller than 0.05, so we can reject the null hypothesis and conclude that the time series for the fertility rate of woman aged from 40 to 44 was stationary between 1997 and 2003.
You may be interested in using the adf.test()
function to know whether the price at closing of a particular stock is stationary over a given period of time. So, for our next example, we will test whether the Facebook stock price from 2012-06-18 to 2014-11-28 was stationary at closing. We can download the stock price information from Yahoo! using the getSymbols()
function from the quantmod
package. Let's have a look at the following function:
> install.packages("quantmod") > library("quantmod")
The stock symbol for Facebook is "FB" on the NASDAQ stock exchange:
> fbstock <-getSymbols("FB", src="yahoo", from= '2012-06-18', end='2014-11-28', auto.assign=FALSE)
We can inspect the object using the head()
function to print the first six dates, as follows:
> head(fbstock) FB.Open FB.High FB.Low FB.Close FB.Volume FB.Adjusted 2012-06-18 29.96 32.08 29.41 31.41 42978900 31.41 2012-06-19 31.54 32.18 30.70 31.91 30849000 31.91 2012-06-20 31.92 31.93 31.15 31.60 15553600 31.60 2012-06-21 31.67 32.50 31.51 31.84 21875300 31.84 2012-06-22 32.41 33.45 32.06 33.05 74834000 33.05 2012-06-25 32.86 33.02 31.55 32.06 24352900 32.06
We can plot fbstock
information using the chartSeries()
function, which is also part of the quantmod
package. Since we are just interested in viewing the stock price at closing and the total volume, we will only plot the fourth and fifth columns, as shown in the following code:
> chartSeries(fbstock[, 4:5], theme="white", up.col="black")
The plot is shown in the following graph:

Alternatively, you could plot the fbstock
data with the plot.ts
function after converting the fbstock
data from an extended time series (xts
) object to a standard time series (ts
) object with the as.ts()
function:
> plot.ts(as.ts(fbstock[, 4:5]), main="FACEBOOK Stock Information from 2012-06-18 to 2014-11-28")
Here is the plot the previous command generated:

Now, let's perform the ADF test with the adf.test()
function, which is a part of the tseries
package, as follows:
> require(tseries) > adf.test(fbstock[, 4]) Augmented Dickey-Fuller Test data: fbstock[, 4] Dickey-Fuller = -3.123, Lag order = 8, p-value = 0.103 alternative hypothesis: stationary
Based on the p-value
, we cannot reject the null hypothesis and conclude that the time series is not stationary and thus, contains a unit root. We can confirm the ADF test results by performing a Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test. The KPSS test is also used to test whether this time series is stationary, except in this case, the null hypothesis is that the observable time series is stationary around a deterministic trend. To perform the KPSS test on our fbstock
time series, we can use the kpss.test()
function, which is also part of the tseries
package.
Again, since we are interested in the Facebook stock price at closing, we will perform the KPSS test on the fourth column:
> kpss.test(fbstock[, 4]) KPSS Test for Level Stationarity data: fbstock[, 4] KPSS Level = 9.6561, Truncation lag parameter = 5, p-value = 0.01 Warning message: In kpss.test(fbstock[, 4]) : p-value smaller than printed p-value
As you can see, our p-value
is below 0.05, so we can reject the null hypothesis and conclude that the time series for the Facebook stock price at closing, between 2012-06-18 and 2014-11-28, is not stationary.
Note
If you are interested in additional packages for time series analysis in R, we recommend you consult the CRAN R project Time Series Analysis website at http://cran.r-project.org/web/views/TimeSeries.html.