Sunday, June 1, 2014

Weekend at Bernoulli's

I have been doing some background reading on group sequential trials for an upcoming project. One of the papers I came across in terms of the required numerical methods is "Repeated Significance Tests on Accumulating Data" (Armitage, et al, 1969). In this paper, the scenario is considered of observing an outcome and then testing whether this outcome is significant with respect to a null hypothesis. If the outcome is not significant, we accrue an additional observation and consider whether the sum of the two is significant. This process continues until either the cumulative sum is extreme enough such that  the null hypothesis is rejected or the maximal sample size is attained. For my work, I am most interested in the normal distribution, but Armitage, et al tabulated results for binomial, normal and exponential distributions in their paper. (To activate 'read-along' mode for what follows, I suggest doing a quick search for the paper online.)

For the binomial distribution, Armitage et al tabulated exact calculations of rejection probabilities in their Table 1. Since I have too much free time (or want to do something easier than my homework), I decided to try reproducing the aforementioned Table 1 by simulation in R.

The scenario we consider is that for a selection of maximal sample sizes from $N =$ 10 to 150, we are interested in seeing how frequently under the null hypothesis $p = p_0 = \frac{1}{2}$ we reject the null hypothesis (make a type I error) at or before the maximal sample size when the nominal level/type I error rate is equal to $2 \alpha$. At each interim analysis $j = 1, \ldots, N$, we consider the sum $S_j$ of our binary outcomes and compare $S_j$ to the upper and lower $\alpha$-th percentiles of the Binomial$(j,p_0)$ distribution. If $S_j$ is more extreme than either bound, we reject the null hypothesis and conclude that the true proportion $p$ is not equal to $\frac{1}{2}$.

Averaging the results of 10,000 simulations, nearly all of my values agree with the first significant figure of the exact calculations in Table 1. Beyond that there are no guarantees of concordance, but that is to be expected in simulations. My code is far from optimized, so I apologize if you decide to run it for all 10,000 simulations yourself. It took somewhere between 30-60 minutes to run on my laptop.

Here's the code:

## RepSigTests.R
## 6/1/2014

## Exploring Armitage, McPherson and Rowe (1969)
## Repeated Significance Tests on Accumulating Data

## For Binomial Distributions

# We fix the sequences of max sample sizes and 2*alpha, the nominal level
n = c(seq(10,50,5),seq(60,100,10),c(120,140,150))
talpha = seq(.01,.05,.01)

# We'll consider sequential sampling under the null hypothesis H0: p = 0.5
p0 = 1/2

## After each new observation at time m of Xm = 0 or 1,
## we compare the cumulative sum Sm to bounds depending on twice alpha:
# am = qbinom(p = talpha[1]/2, size = n[1], prob = p0, lower.tail = TRUE)
# bm = qbinom(p = talpha[1]/2, size = n[1], prob = p0, lower.tail = FALSE)
## If Sm < am or Sm > bm, then we reject p = p0.

## The following function carries this process, starting with 1 observation
## and continuing the sequential analysis until either Sm is outside of the
## bounds (am,bm) or m is at least max.
seq.binom = function(talpha,p0,max){
  Sm = 0
  m = 0
  continue = TRUE
  while (continue) {
    m = m + 1
    am = qbinom(p = talpha/2, size = m, prob = p0, lower.tail = TRUE)
    bm = qbinom(p = talpha/2, size = m, prob = p0, lower.tail = FALSE)
    Xm = rbinom(n = 1, size = 1, prob = p0)
    Sm = Sm + Xm
    if ( Sm < am || Sm > bm || m > max) {continue = FALSE}
  }
  return(c(M=m,aM=am,Sum=Sm,bM=bm))
}

## This next function checks whether we rejected p = p0 or failed to reject
## at sample size m (the mth analysis).
check.binom = function(talpha,p0,max){
  res = seq.binom(talpha,p0,max)
  reject = res[2] > res[3] || res[3] > res[4]
}

## Next we reproduce Table 1 of Armitage, McPherson and Rowe (1969)
## using our simulations rather than exact calculations of binomial probabilities.
## First, we set the number of repetitions/seed for simulation:
B = 10000
set.seed(1990)

## Table 1
# "The probability of being absorbed at or before the nth observation in
# binomial sampling with repeated tests at a nominal two-sided significance
# level 2*alpha."

## This takes a while to run. The inner loop could be removed if the seq.binom()
## were to be written with a comparison of m to each maximal sample size in n.
tab1 = matrix(0,length(n),length(talpha))
for (j in 1:length(talpha)){
  for (i in 1:length(n)){
    tab1[i,j] = sum(replicate(B, check.binom(talpha[j],p0,n[i])))/B
  }
}

#> tab1
#[,1]   [,2]   [,3]   [,4]   [,5]
#[1,] 0.0076 0.0215 0.0277 0.0522 0.0566
#[2,] 0.0144 0.0297 0.0454 0.0827 0.0820
#[3,] 0.0193 0.0374 0.0593 0.0911 0.1003
#[4,] 0.0272 0.0507 0.0700 0.1092 0.1218
#[5,] 0.0268 0.0550 0.0841 0.1214 0.1368
#[6,] 0.0305 0.0611 0.0934 0.1323 0.1397
#[7,] 0.0331 0.0648 0.0982 0.1284 0.1555
#[8,] 0.0374 0.0655 0.1007 0.1470 0.1657
#[9,] 0.0388 0.0715 0.1056 0.1520 0.1734
#[10,] 0.0452 0.0842 0.1154 0.1645 0.1905
#[11,] 0.0496 0.0814 0.1260 0.1812 0.2001
#[12,] 0.0521 0.0970 0.1336 0.1830 0.2206
#[13,] 0.0555 0.0994 0.1355 0.1964 0.2181
#[14,] 0.0557 0.1039 0.1508 0.2037 0.2268
#[15,] 0.0567 0.1105 0.1577 0.2094 0.2362
#[16,] 0.0636 0.1226 0.1666 0.2235 0.2534
#[17,] 0.0664 0.1181 0.1692 0.2310 0.2635

## Because everyone loves a good heatmap:
## Credit goes to this stack exchange thread for the labels:
## http://stackoverflow.com/questions/17538830/x-axis-and-y-axis-labels-in-pheatmap-in-r
library(pheatmap)
library(grid)
rownames(tab1) = n
colnames(tab1) = talpha

setHook("grid.newpage", function() pushViewport(viewport(x=1,y=1,width=0.9, height=0.9, name="vp", just=c("right","top"))), action="prepend")

pheatmap(
  tab1,
  cluster_cols=FALSE,
  cluster_rows=FALSE,
  show_rownames=T,
  show_colnames=T,
  border_color = "white",
  main = "Table 1: Sequential Binomial Stopping Probabilities",
  )

setHook("grid.newpage", NULL, "replace")
grid.text("Nominal Level (2*Alpha)", y=-0.07, gp=gpar(fontsize=16))
grid.text("Maximal Sample Size", x=-0.07, rot=90, gp=gpar(fontsize=16))

And of course, here's the heatmap:



Reference
P. Armitage, C. K. McPherson, B. C. Rowe (1969).  Repeated Significance Tests on Accumulating Data. Journal of the Royal Statistical Society. Series A (General), Vol. 132, No. 2 (1969), pp. 235-244.