Login

I’m trying to decide if I should use a Poisson regression for my count data on website errors per day, but the variance is almost double the mean. I’ve read that this overdispersion means the model assumptions are violated, but I’m not sure what to actually do about it in practice.

I tried Poisson regression at first. The daily error counts had a variance about twice the mean, so the residuals looked off and the standard errors felt unreliable. Switching to a negative binomial model tightened things up and the dispersion parameter suggested overdispersion was captured. AIC dropped and the Pearson residuals looked more reasonable.

We also tried a quasi Poisson, which keeps the mean structure but inflates the standard errors to account for overdispersion. It was easier to run in our stats package, but the inference was fuzzier.

For days with zero errors we toyed with zero inflated variants. In our data zeros weren't extreme, and the extra complexity didn't always pay off.

We added covariates like day of week and holidays. Sometimes a weekend bump or a midweek lull showed up, and that helped reduce unexplained variance even before changing the model.

I'm not sure the whole issue is overdispersion. Maybe the problem is data quality, misreporting, or how we aggregated counts. It can feel like we're chasing the wrong thing.

Have you checked a dispersion statistic like the Pearson chi square divided by the degrees of freedom to quantify how bad the overdispersion is?

Login
Username:
Password:	Lost Password?
	Remember me