Multimodal Embedding & Reranker Models with Sentence Transformers
Sentence Transformers v5.4 now supports encoding and comparing text, images, audio, and video in a single API, enabling visual document retrieval, cross-modal search, and multimodal RAG pipelines through unified embedding and reranking models.
Hugging Face · 12 min read
Tools