LLM Generation of Q/A pairs is powerful, but problematic because certain LLM techniques will have a bias to certain types of Q/A pairs. If you have the system you're building also generate the Q/A pairs, you'll likely have unrealistically rosy results.
We've found that using multiple RAG pipelines with different biases can lead to higher quality Q/A pair generation.
Also, having a human in the loop that can not only validate but also edit questions and answers is very useful in encouraging diverse and robust output.
Yet another thing, observability is useful. Having a way to record the provenance of data is incredibly important, especially when validating LLM responses manually. So, on top of actually generating the QA pairs, it's also useful to build a system that can link those Q/A pairs to a specific document, page, and preferably paragraph or figure.
I recently filmed a video on this, if you're interested:
Thanks for sharing, this is definitely practical advice!
Would you consider talking about building your own test dataset? Or briefly describe your approach in the comments?
I don't seem to have much thought about these other than "given (single or multiple) documents let LLM generate QA"!
Thanks in advance!
Hey Remixa!
LLM Generation of Q/A pairs is powerful, but problematic because certain LLM techniques will have a bias to certain types of Q/A pairs. If you have the system you're building also generate the Q/A pairs, you'll likely have unrealistically rosy results.
We've found that using multiple RAG pipelines with different biases can lead to higher quality Q/A pair generation.
Also, having a human in the loop that can not only validate but also edit questions and answers is very useful in encouraging diverse and robust output.
Yet another thing, observability is useful. Having a way to record the provenance of data is incredibly important, especially when validating LLM responses manually. So, on top of actually generating the QA pairs, it's also useful to build a system that can link those Q/A pairs to a specific document, page, and preferably paragraph or figure.
I recently filmed a video on this, if you're interested:
https://www.youtube.com/watch?v=vb-Ydzt_k8o
Got it! Benefit a lot!