How should I handle batch effects in LC-MS/MS proteomics data?
#1
I’m trying to interpret the results from my proteomics experiment, and I’m stuck on how to properly account for batch effects in my LC-MS/MS data. The variance introduced by different run dates seems to be overshadowing the biological signal I’m looking for, even after basic normalization. I’m not sure if I should be using a combat-like adjustment or if there’s a more suitable statistical model for my specific experimental design.
Reply
#2
I tried ComBat on log2 intensities; it helped a bit but batch clustering remained after correction, and missing values made things messier. Normalization alone wasn’t enough for me either.
Reply
#3
I leaned toward a linear mixed model with batch as a random effect; it let me see if the batch variance shrinks when you include the biology factors, rather than forcing a global match across proteins.
Reply
#4
I’m wary of ComBat here because proteomics data have MNAR missingness and heavy tails, which can distort the correction. I did ping MSstats as an alternative, since it models runs and replicates; sometimes that clarified some hits, sometimes not.
Reply
#5
I keep wondering if the real issue isn’t batch per se but sample prep or QC drift tied to run dates. Maybe recheck instrument performance and run a small QC splash; do you have a truly balanced design across days?
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: