Related papers: Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

URL: http://arxiv.org/abs/2506.08592v1
Date: Tue, 10 Jun 2025 09:00:33 GMT
Title: Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings
Authors: Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou,
Abstract summary: This work focuses on an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within the semantics.<n>We introduce a new evaluation dataset in Chinese, named CapRetrieval, whose passages are image captions, and queries are phrases inquiring entities or events in various forms.<n>Zero-shot evaluation suggests that encoders may fail on these fine-grained matching, regardless of training sources or model sizes.
Score: 78.05609552686053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work focuses on an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within the semantics, resulting in failed dense retrieval on even simple cases. To examine such behaviors, we first introduce a new evaluation dataset in Chinese, named CapRetrieval, whose passages are image captions, and queries are phrases inquiring entities or events in various forms. Zero-shot evaluation suggests that encoders may fail on these fine-grained matching, regardless of training sources or model sizes. Aiming for enhancement, we proceed to finetune encoders with our proposed data generation strategies, which obtains the best performance on CapRetrieval. Within this process, we further identify an issue of granularity dilemma, a challenge for embeddings to express fine-grained salience while aligning with overall semantics. Our dataset, code and models in this work are publicly released at https://github.com/lxucs/CapRetrieval.

Related papers

Exploiting Inherent Class Label: Towards Robust Scribble Supervised Semantic Segmentation [15.439883888976464]
We propose a class-driven scribble promotion network for robust scribble-supervised semantic segmentation.<n>Within the network, we introduce a localization rectification module to mitigate noisy labels and a distance perception module to identify reliable regions surrounding scribble annotations and pseudo-labels.<n>Our method demonstrates competitive performance in both accuracy and robustness, underscoring its superiority over existing approaches.
arXiv Detail & Related papers (2025-03-18T04:43:07Z)
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues [47.213906345208315]
We propose BRIDGE, a new learnable and reference-free image captioning metric. Our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores.
arXiv Detail & Related papers (2024-07-29T18:00:17Z)
PRIME: Prioritizing Interpretability in Failure Mode Extraction [49.93565079216376]
We study the challenge of providing human-understandable descriptions for failure modes in trained image classification models. We propose a novel approach that prioritizes interpretability in this problem. Our method successfully identifies failure modes and generates high-quality text descriptions associated with them.
arXiv Detail & Related papers (2023-09-29T22:00:12Z)
Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure. We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations. We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z)
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models. To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z)
Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
Robust Document Representations using Latent Topics and Metadata [17.306088038339336]
We propose a novel approach to fine-tuning a pre-trained neural language model for document classification problems. We generate document representations that capture both text and metadata artifacts in a task manner. Our solution also incorporates metadata explicitly rather than just augmenting them with text.
arXiv Detail & Related papers (2020-10-23T21:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.