Login

I’m trying to interpret the results of my A/B test for a new website feature, but my p-value is 0.06 and I’m really struggling with whether to call this a meaningful result or not. It feels like I’m right on the edge, and I know the conventional threshold is 0.05, but that seems so arbitrary now that I’m staring at my own data.

That p-value at 0.06 still feels like a coin toss that just missed the line. The 0.05 threshold is arbitrary when you’re staring at your own data. I keep thinking about practical significance: how big is the lift, would it matter in the real world, and what does the confidence interval show? If it mostly points positive but with a lot of wobble, I’d tread carefully and avoid a rushed ship.

I tried a quick robustness check by splitting by day and by device. In a few segments the bump disappeared and the rest looked tiny. The window was short, the noise was loud, and we didn’t commit to shipping yet.

Maybe the issue isn’t the signal at all but the metric. If users interact but don’t convert, a lift on that metric won’t translate to business impact. I’ve chased borderline results and later found the real win was elsewhere. Is the problem really the metric?

Anyway, I dumped the numbers into a doc, told the team we’re not sure, and started planning the next small test. It felt abrupt and unfinished.

Login
Username:
Password:	Lost Password?
	Remember me