What’s the best way to fix domain errors using a language model on lab notes?
#1
I’ve been trying to use a large language model to help me parse and categorize decades of old, handwritten lab notes from my thesis work, but I’m hitting a wall. The handwriting recognition is decent, but the model keeps misinterpreting the specific chemical nomenclature and experimental conditions, which makes the automated tagging useless for my actual research. I’m wondering if anyone else has tried this for historical scientific data and how you handled the domain-specific errors.
Reply
#2
I waded through something similar years back. OCR caught letters okay, but the chemistry terms kept blowing up the tagging. We added a human in the loop: a shared glossary of the terms that always misread, and a flag for anything outside that glossary. After a lot of cleanup, the results stabilized for the terms we knew, but the rest still wandered.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: