Discussion about this post

User's avatar
Daniel Warfield's avatar

I would be thrilled to answer any questions or thoughts you might have. An article combined with thoughts, ideas, and considerations holds much more educational power!

Expand full comment
zhongpu's avatar

You have shown three patterns of multi-modal RAG, but both input and output are in text format. Is this practical in real-world? I mean, what if the input contain both text and image?

Expand full comment

No posts