Related papers: Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

URL: http://arxiv.org/abs/2511.02358v1
Date: Tue, 04 Nov 2025 08:24:41 GMT
Title: Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
Authors: Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park,
Abstract summary: We propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries.<n>We show that M-Solomon not only surpassed the baseline without augmentation by a large margin but also outperformed the baseline that always used augmentation.
Score: 3.765602121469129
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training datasets into two groups at the dataset level. One includes queries that require augmentation and the other includes queries that do not. Then, we introduces a synthesis process that generates appropriate augmentations for queries that require them by leveraging a powerful Multimodal LLM (MLLM). Next, we present adaptive query augmentation. Through this step, M-Solomon can conduct query augmentation only when necessary by learning to generate synthetic augmentations with the prefix /augment for queries that demand them and to generate the simple string /embed for others. Experimental results showed that M-Solomon not only surpassed the baseline without augmentation by a large margin but also outperformed the baseline that always used augmentation, providing much faster embedding latency.

Related papers

Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z)
Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment [4.21943400140261]
Aligned Query Expansion (AQE) is a novel approach to enhance query expansion for passage retrieval in open-domain question answering.<n>We show that AQE outperforms baseline models for query expansion in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-07-15T07:11:29Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
A New Query Expansion Approach via Agent-Mediated Dialogic Inquiry [10.76224743599566]
We propose AMD: a new Agent-Mediated Dialogic Framework that engages in a dialogic inquiry involving three specialized roles.<n>By leveraging a multi-agent process, AMD effectively crafts richer query representations through inquiry and feedback refinement.
arXiv Detail & Related papers (2025-02-12T16:39:06Z)
Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments. We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques. Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z)
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [92.5712549836791]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)<n>We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z)
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)<n>We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.<n>Our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR.
arXiv Detail & Related papers (2024-11-04T20:06:34Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.<n>We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.<n> Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion [39.24969189479343]
We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification. Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-29T16:04:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.