Testing Document Contextualized AI …

Daniel Warfield

Apr 7

How to Know If RAG and Agentic Systems Actually Work

Read →

3 Comments

Remixa

Apr 7

Thanks for sharing, this is definitely practical advice!

Would you consider talking about building your own test dataset? Or briefly describe your approach in the comments?

I don't seem to have much thought about these other than "given (single or multiple) documents let LLM generate QA"!

Thanks in advance!

Expand full comment

Reply (1)

Daniel Warfield

Apr 7Edited

Hey Remixa!

LLM Generation of Q/A pairs is powerful, but problematic because certain LLM techniques will have a bias to certain types of Q/A pairs. If you have the system you're building also generate the Q/A pairs, you'll likely have unrealistically rosy results.

We've found that using multiple RAG pipelines with different biases can lead to higher quality Q/A pair generation.

Also, having a human in the loop that can not only validate but also edit questions and answers is very useful in encouraging diverse and robust output.

Yet another thing, observability is useful. Having a way to record the provenance of data is incredibly important, especially when validating LLM responses manually. So, on top of actually generating the QA pairs, it's also useful to build a system that can link those Q/A pairs to a specific document, page, and preferably paragraph or figure.

I recently filmed a video on this, if you're interested:

https://www.youtube.com/watch?v=vb-Ydzt_k8o

Expand full comment

Reply (1)

Remixa

Apr 8

Got it! Benefit a lot!

Expand full comment

Intuitively and Exhaustively Explained

Testing Document Contextualized AI …