Login

I’m trying to decide if I should use a Poisson or a negative binomial model for my count data on website errors per day. The variance is about 1.8 times the mean, so it’s overdispersed, but I’m not sure if that’s enough to justify the more complex model.

I started with Poisson, and the residuals flagged overdispersion. The mean and variance didn’t track, so I tried the alternative and it fit noticeably better.

1.8 times the mean is not tiny in practice; with that level of spread you tend to misstate precision if you cling to the simplest count model.

If you have covariates like day of week or traffic volume, the more flexible model can absorb that extra variance without blaming random noise.

I added a covariate and the dispersion dropped but it still didn't look perfect; the fit was clearly better though.

I briefly toyed with zero inflation, but in my data zeros weren't dominant, so I dropped it.

I also wondered if the problem was data collection gaps on weekends; after cleaning those days the dispersion softened a bit.

Do you have evidence of clustering in days, like bursts after outages?

Login
Username:
Password:	Lost Password?
	Remember me