MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
- URL: http://arxiv.org/abs/2506.03144v1
- Date: Tue, 03 Jun 2025 17:59:14 GMT
- Title: MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
- Authors: Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li,
- Abstract summary: MERIT is the first multilingual dataset for interleaved multi-condition semantic retrieval.<n>This paper introduces MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval.
- Score: 55.486895951981566
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios frequently involve interleaved multi-condition queries with multiple images. Hence, this paper introduces MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval, comprising 320,000 queries with 135,000 products in 5 languages, covering 7 distinct product categories. Extensive experiments on MERIT identify existing models's limitation: focusing solely on global semantic information while neglecting specific conditional elements in queries. Consequently, we propose Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by integrating embedding reconstruction to preserve fine-grained conditional elements and contrastive learning to extract comprehensive global semantics. Experiments demonstrate that Coral achieves a 45.9% performance improvement over conventional approaches on MERIT, with strong generalization capabilities validated across 8 established retrieval benchmarks. Collectively, our contributions - a novel dataset, identification of critical limitations in existing approaches, and an innovative fine-tuning framework - establish a foundation for future research in interleaved multi-condition semantic retrieval.
Related papers
- Composed Object Retrieval: Object-level Retrieval via Composed Expressions [71.47650333199628]
Composed Object Retrieval (COR) is a brand-new task that goes beyond image-level retrieval to achieve object-level precision.<n>We construct COR127K, the first large-scale COR benchmark that contains 127,166 retrieval triplets with various semantic transformations in 408 categories.<n>We also present CORE, a unified end-to-end model that integrates reference region encoding, adaptive visual-textual interaction, and region-level contrastive learning.
arXiv Detail & Related papers (2025-08-06T13:11:40Z) - MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling [58.251621637466904]
Muti-query Scene Text retrieval with Attention Recycling (MSTAR) is a box-free approach for scene text retrieval.<n>It incorporates progressive vision embedding to dynamically capture the multi-grained representation of texts.<n>Extensive experiments demonstrate the superiority of our method across seven public datasets and the MQTR dataset.
arXiv Detail & Related papers (2025-06-12T11:54:13Z) - MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.<n>We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z) - Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.<n>We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.<n>We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - Optimizing Multi-Stage Language Models for Effective Text Retrieval [0.0]
We introduce a novel two-phase text retrieval pipeline optimized for Japanese legal datasets.<n>Our method leverages advanced language models to achieve state-of-the-art performance.<n>To further enhance robustness and adaptability, we incorporate an ensemble model that integrates multiple retrieval strategies.
arXiv Detail & Related papers (2024-12-26T16:05:19Z) - CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model [9.224965304457708]
This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework.<n>It incorporates image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering.<n>Experiments on real-word datasets and public benchmarks on knowledge-based VQA and safety demonstrated that CUE-M outperforms baselines and establishes new state-of-the-art results.
arXiv Detail & Related papers (2024-11-19T07:16:48Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)<n>We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.<n>Our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - MINERS: Multilingual Language Models as Semantic Retrievers [23.686762008696547]
This paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual language models in semantic retrieval tasks.
We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages.
Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-11T16:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.