Login

I’ve been trying to use a large language model to help parse and categorize decades of unstructured lab notes in my field, but I’m hitting a wall with its tendency to generate plausible but completely fabricated references to non-existent papers or methodologies. It’s making me question whether this approach is fundamentally flawed for rigorous historical data reconstruction. Has anyone else run into this and found a practical way to anchor the model’s output to verified sources only?

Yes, I hit this hard last year. We built a small offline index of verified papers and methodologies we actually rely on, and we forced the model to cite only from that index. Outputs were flagged if no match was found, and we had human reviewers verify the citations. The rate of hallucinated references dropped dramatically but the process slowed the workflow and required a dedicated librarian to maintain the index.

In practice, the bottleneck wasn't the model’s appetite for fake references as much as the notes themselves occasionally describing things that never existed. We added a human-in-the-loop step: a two-column QA sheet where the reviewer checked claims against the source list; if not in the list, we treated it as 'needs source insertion'. It helped, but you can feel the fatigue.

I wonder if the real issue is metadata quality. I tried restricting outputs to only include citations present in the file metadata, and it helped a little. But the model still generated plausible links that weren't there. Do you find metadata helps you or not?

Sometimes I drift off topic, like after chasing a mislinked acronym. I paused, added a glossary of terms used in notes across decades, then fed that into the model. It didn't fix everything, but it kept the hallucinations from wandering too far.

Login
Username:
Password:	Lost Password?
	Remember me