How to pick between Poisson and negative binomial for overdispersed data?
#1
I’m trying to decide if I should use a Poisson or a negative binomial model for my count data on website errors per day. The variance is about 1.8 times the mean, so I’m worried about overdispersion, but my sample is fairly small.
Reply
#2
I’d lean toward a more flexible, overdispersed model here because the variance is higher than the mean. In small samples the fixed-variance baseline tends to understate uncertainty. If you can, compare the two by AIC and by looking at residual dispersion to see which fits better.
Reply
#3
I tried something similar on a tiny dataset and the extra variability kept showing up in the residuals. It made the simple counts feel too optimistic about precision, so I experimented with the overdispersed option and the results were only okay, not a slam dunk.
Reply
#4
Maybe the problem isn’t the distribution at all. daily traffic swings, seasonality, or days after a release can drive spikes. If you’re unsure, plot by day of week and consider an offset for traffic before blaming the model.
Reply
#5
If you’re stuck, add a tiny amount of data and do a quick holdout test to gauge predictive performance rather than relying on in-sample fit. A bit of cross validation or bootstrap helped me decide whether the extra complexity was worth it.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: