Discussion about this post

User's avatar
Daniel Warfield's avatar

I would be thrilled to answer any questions or thoughts you might have. An article combined with thoughts, ideas, and considerations holds much more educational power!

zhongpu's avatar

You have shown three patterns of multi-modal RAG, but both input and output are in text format. Is this practical in real-world? I mean, what if the input contain both text and image?

1 more comment...

No posts

Ready for more?