For the binomial distribution, Armitage et al tabulated exact calculations of rejection probabilities in their Table 1. Since I have too much free time (or want to do something easier than my homework), I decided to try reproducing the aforementioned Table 1 by simulation in R.

The scenario we consider is that for a selection of maximal sample sizes from $N =$ 10 to 150, we are interested in seeing how frequently under the null hypothesis $p = p_0 = \frac{1}{2}$ we reject the null hypothesis (make a type I error) at or before the maximal sample size when the nominal level/type I error rate is equal to $2 \alpha$. At each interim analysis $j = 1, \ldots, N$, we consider the sum $S_j$ of our binary outcomes and compare $S_j$ to the upper and lower $\alpha$-th percentiles of the Binomial$(j,p_0)$ distribution. If $S_j$ is more extreme than either bound, we reject the null hypothesis and conclude that the true proportion $p$ is not equal to $\frac{1}{2}$.

Averaging the results of 10,000 simulations, nearly all of my values agree with the first significant figure of the exact calculations in Table 1. Beyond that there are no guarantees of concordance, but that is to be expected in simulations. My code is far from optimized, so I apologize if you decide to run it for all 10,000 simulations yourself. It took somewhere between 30-60 minutes to run on my laptop.

Here's the code:

## RepSigTests.R

## 6/1/2014

## Exploring Armitage, McPherson and Rowe (1969)

## Repeated Significance Tests on Accumulating Data

## For Binomial Distributions

# We fix the sequences of max sample sizes and 2*alpha, the nominal level

n = c(seq(10,50,5),seq(60,100,10),c(120,140,150))

talpha = seq(.01,.05,.01)

# We'll consider sequential sampling under the null hypothesis H0: p = 0.5

p0 = 1/2

## After each new observation at time m of Xm = 0 or 1,

## we compare the cumulative sum Sm to bounds depending on twice alpha:

# am = qbinom(p = talpha[1]/2, size = n[1], prob = p0, lower.tail = TRUE)

# bm = qbinom(p = talpha[1]/2, size = n[1], prob = p0, lower.tail = FALSE)

## If Sm < am or Sm > bm, then we reject p = p0.

## The following function carries this process, starting with 1 observation

## and continuing the sequential analysis until either Sm is outside of the

## bounds (am,bm) or m is at least max.

seq.binom = function(talpha,p0,max){

Sm = 0

m = 0

continue = TRUE

while (continue) {

m = m + 1

am = qbinom(p = talpha/2, size = m, prob = p0, lower.tail = TRUE)

bm = qbinom(p = talpha/2, size = m, prob = p0, lower.tail = FALSE)

Xm = rbinom(n = 1, size = 1, prob = p0)

Sm = Sm + Xm

if ( Sm < am || Sm > bm || m > max) {continue = FALSE}

}

return(c(M=m,aM=am,Sum=Sm,bM=bm))

}

## This next function checks whether we rejected p = p0 or failed to reject

## at sample size m (the mth analysis).

check.binom = function(talpha,p0,max){

res = seq.binom(talpha,p0,max)

reject = res[2] > res[3] || res[3] > res[4]

}

## Next we reproduce Table 1 of Armitage, McPherson and Rowe (1969)

## using our simulations rather than exact calculations of binomial probabilities.

## First, we set the number of repetitions/seed for simulation:

B = 10000

set.seed(1990)

## Table 1

# "The probability of being absorbed at or before the nth observation in

# binomial sampling with repeated tests at a nominal two-sided significance

# level 2*alpha."

## This takes a while to run. The inner loop could be removed if the seq.binom()

## were to be written with a comparison of m to each maximal sample size in n.

tab1 = matrix(0,length(n),length(talpha))

for (j in 1:length(talpha)){

for (i in 1:length(n)){

tab1[i,j] = sum(replicate(B, check.binom(talpha[j],p0,n[i])))/B

}

}

#> tab1

#[,1] [,2] [,3] [,4] [,5]

#[1,] 0.0076 0.0215 0.0277 0.0522 0.0566

#[2,] 0.0144 0.0297 0.0454 0.0827 0.0820

#[3,] 0.0193 0.0374 0.0593 0.0911 0.1003

#[4,] 0.0272 0.0507 0.0700 0.1092 0.1218

#[5,] 0.0268 0.0550 0.0841 0.1214 0.1368

#[6,] 0.0305 0.0611 0.0934 0.1323 0.1397

#[7,] 0.0331 0.0648 0.0982 0.1284 0.1555

#[8,] 0.0374 0.0655 0.1007 0.1470 0.1657

#[9,] 0.0388 0.0715 0.1056 0.1520 0.1734

#[10,] 0.0452 0.0842 0.1154 0.1645 0.1905

#[11,] 0.0496 0.0814 0.1260 0.1812 0.2001

#[12,] 0.0521 0.0970 0.1336 0.1830 0.2206

#[13,] 0.0555 0.0994 0.1355 0.1964 0.2181

#[14,] 0.0557 0.1039 0.1508 0.2037 0.2268

#[15,] 0.0567 0.1105 0.1577 0.2094 0.2362

#[16,] 0.0636 0.1226 0.1666 0.2235 0.2534

#[17,] 0.0664 0.1181 0.1692 0.2310 0.2635

## Because everyone loves a good heatmap:

## Credit goes to this stack exchange thread for the labels:

## http://stackoverflow.com/questions/17538830/x-axis-and-y-axis-labels-in-pheatmap-in-r

library(pheatmap)

library(grid)

rownames(tab1) = n

colnames(tab1) = talpha

setHook("grid.newpage", function() pushViewport(viewport(x=1,y=1,width=0.9, height=0.9, name="vp", just=c("right","top"))), action="prepend")

pheatmap(

tab1,

cluster_cols=FALSE,

cluster_rows=FALSE,

show_rownames=T,

show_colnames=T,

border_color = "white",

main = "Table 1: Sequential Binomial Stopping Probabilities",

)

setHook("grid.newpage", NULL, "replace")

grid.text("Nominal Level (2*Alpha)", y=-0.07, gp=gpar(fontsize=16))

grid.text("Maximal Sample Size", x=-0.07, rot=90, gp=gpar(fontsize=16))

**Reference**

P. Armitage, C. K. McPherson, B. C. Rowe (1969). Repeated Significance Tests on Accumulating Data. Journal of the Royal Statistical Society. Series A (General), Vol. 132, No. 2 (1969), pp. 235-244.