What are limits of language models for parsing handwritten notes and formulas?
#1
I’ve been trying to use a large language model to help me parse and categorize decades of old, handwritten lab notes in my field, but I’m hitting a wall. The model keeps confidently misinterpreting chemical formulas and unit abbreviations from the specific notation my old professor used. I’m starting to wonder if this approach is fundamentally flawed for such a niche, domain-specific task without a massive and tailored training set. Has anyone else run into this problem with specialized scientific literature?
Reply
#2
I’ve tried a similar setup and the confidence with which it locks onto a wrong chemical formula is maddening. Handwriting quirks, subscripts, and old abbreviations would turn into totally different compounds in a single pass. It didn’t feel like occasional slips; it felt baked in after a few pages. A handful of pages with metal salts and solvent codes kept tripping it.
Reply
#3
I wonder if the real bottleneck isn’t the model at all but the handwriting and the context. If a note says CuSO4 with a tiny dot that means something else in that lab, the tool can’t guess. OCR misreads happen, the symbol dictionary is missing, you end up with a confident but wrong read. I found it cheaper to run a human-in-the-loop workflow first and only then think about automation.
Reply
#4
I did a tiny pilot: about 300 lines labeled, and I found around 60 entries that looked readable, but about 15 were wrong because of stray strokes or ligatures. I built a small glossary of the professor’s shorthand and added a few post-processing checks: if a parsed formula doesn’t match known patterns, tag it for review. The gains were real but not trustworthy for a large archive yet.
Reply
#5
I kept thinking maybe the issue isn’t the parsing at all but the dataset itself. The notes aren’t a clean corpus—experiments drift, units swap around, and context is scattered. I tried focusing on a narrow slice first, like standard compounds and units, and that helped a bit, but the moment you switch labs or eras everything changes. Is the problem really the niche notation, or is the tool just not built for this kind of uneven data?
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: