NumXL Cookbook - GLM with Binary Data

Mohamad

March 03, 2017 00:15

In this tutorial, we will use sample data gathered during a clinical trial of a new chemical/pesticide on tobacco Budworms. The subjects (i.e. Budworms) are grouped into batches of 20 and exposed to different doses of the chemical. The results are summarized below:

This figure shows the input data table for generalized linear model in excel with binary data example

Data preparation

Our objective here is to model (and forecast) the effectiveness of the new chemical using different dosages and explain, to some extent, any variation based on the gender of the budworm. Furthermore, we want to express the results in terms of the worm mortality rates (i.e. probability).

This figure shows the generalized linear model in excel - Convert binary data into mortality rates

We plot the data into two separate curves: males and females. It is apparent that the mortality rate is affected by two factors: gender and dosage.

This figure shows the plot for mortality rates for male and female Bud worm

We will make two assumptions: (1) the results for each trial (i.e. batch) are drawn from a Binomially distributed population; we would like to estimate p - the probability of success (i.e. worm’s death). The probability (p) is allowed to vary across different trials (batches). (2) The probability of success is affected by two factors: the gender of the subject and the administered dosage of the drug.

Based on these two assumptions, we would model this relationship:

$$P=f(X,Y)=E[p|X,Y]$$

Modeling

We are ready now to propose a statistical model: the generalized linear model in Excel with residuals following the Binomial distribution.

This figure shows the generalized linear model wizard

For now, we choose “Logit” as our link (transform) function, specify the trial or batch size(20), and instruct the Wizard to calibrate (i.e. compute optimal values for the coefficients). Leave the Goodness-of-fit and residual diagnosis options checked.

This figure shows the model specification output table

Calibration

In this case, the Generalized Linear Model in Excel (GLM) Wizard has calibrated the model’s coefficients, so we can skip this step.

But, in the event we wish to experiment with different link functions: LOGIT, PROBIT or LOG-LOG, then we need to re-calibrate the model. To do so, we can either:

Create a new model with the wizard, or,
Change the “Lvk” parameter in an existing model table, and run the calibration using the NumXL toolbar

This figure shows the calibration wizard or user form

Step 1: Select the cell that acts as a header for the model table

Step 2: Click on the calibration icon/menu (Excel 2003)

Step 3: Click on the “Solve” button in the Solver window

Forecast

Once the model is calibrated, and we are happy with the residuals, we can use it to construct our forecast mean (and the confidence interval around it).

Using the NumXL function (GLM_FORE), we can compute the mean. Using GLM_FORECI, we can compute the upper and lower limit of the confidence interval.

This figure shows the forecast output table

Plotting the data again (actual) versus the model values.

This figure shows the forecast plot with confidence region for male budworm in Excel

The dots represent the sample data, while the centerline is the forecast mean. The shaded regions in the graphs are the 95% confidence intervals.

Notes

The forecast error decrease as we increase the dosage (C.I. gets tighter). This is evident in male and female batches
The logarithmic relation detected when we plot the raw data can be merely a data anomaly; the Generalized Linear Model in Excel shows more like a quadratic type of relationship.
The mean is not exactly the center of the confidence interval due to the discrete nature of the underlying binomial distribution, and the small-batch/trial size.

Tutorial Video

Attachments

GLM-Binomial-BudWorms.xlsx (20 KB)
GLM-Tutorial.pdf (400 KB)

Data preparation

Modeling

Calibration

Forecast

Tutorial Video

Attachments

Related articles

Comments