How reliable is a language model for parsing handwritten lab notes?
#1
I’ve been trying to use a large language model to help me parse and categorize decades of old, handwritten lab notes in my field, but I’m hitting a wall with its consistency. It will brilliantly connect a fragmented chemical notation to a known procedure one minute, then completely misinterpret a clear diagram the next. This unpredictability makes me hesitant to trust it as a research assistant, even for this preliminary sorting task.
Reply
#2
I fed a couple hundred pages into the model and kept a rough log. The diagrams and notations I could recognize were the minority; roughly six in ten came back as something unhelpful, about a quarter looked okay, and a handful were flat wrong. It felt like two steps forward, three steps back.
Reply
#3
I tried a human-in-the-loop approach. A coder corrected every 50 items and we retrained on the updated labels. On the next pass, accuracy rose from around 40% to roughly 60%, but it still meant nonstop back-and-forth and a slow build.
Reply
#4
Could it be the real bottleneck isn't the model but the handwriting variety itself the way sketches text and shorthand mix? I keep wondering if I'm barking up the wrong tree.
Reply
#5
I've considered pre-processing like batch OCR and mapping symbols to a consistent set before the model sees it; not sure if that fixes the core issue, but it might cut the noise.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: