Related papers: BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

URL: http://arxiv.org/abs/2508.06781v1
Date: Sat, 09 Aug 2025 02:15:17 GMT
Title: BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Authors: Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro, Aaron Courville, Perouz Taslakian,
Abstract summary: BiXSE is a pointwise training method that optimize binary cross-entropy over graded relevance scores.<n>It achieves strong performance with reduced annotation and compute costs.<n>BiXSE offers a robust, scalable alternative for training dense retrieval models.
Score: 6.272555849379284
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural sentence embedding models for dense retrieval typically rely on binary relevance labels, treating query-document pairs as either relevant or irrelevant. However, real-world relevance often exists on a continuum, and recent advances in large language models (LLMs) have made it feasible to scale the generation of fine-grained graded relevance labels. In this work, we propose BiXSE, a simple and effective pointwise training method that optimizes binary cross-entropy (BCE) over LLM-generated graded relevance scores. BiXSE interprets these scores as probabilistic targets, enabling granular supervision from a single labeled query-document pair per query. Unlike pairwise or listwise losses that require multiple annotated comparisons per query, BiXSE achieves strong performance with reduced annotation and compute costs by leveraging in-batch negatives. Extensive experiments across sentence embedding (MMTEB) and retrieval benchmarks (BEIR, TREC-DL) show that BiXSE consistently outperforms softmax-based contrastive learning (InfoNCE), and matches or exceeds strong pairwise ranking baselines when trained on LLM-supervised data. BiXSE offers a robust, scalable alternative for training dense retrieval models as graded relevance supervision becomes increasingly accessible.

Related papers

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation [85.56193980646981]
We propose IF-RewardBench, a comprehensive meta-evaluation benchmark for instruction-following.<n>For each instruction, we construct a preference graph containing all pairwise preferences among multiple responses.<n>Experiments on IF-RewardBench reveal significant deficiencies in current judge models.
arXiv Detail & Related papers (2026-03-05T02:21:17Z)
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning [26.616849067985967]
Groupwise is a novel reranking paradigm for large language models.<n>We propose an innovative pipeline for high quality retrieval and ranking data.<n>The resulting data can be leveraged not only for training the reranker but also for training the retriever.
arXiv Detail & Related papers (2025-11-10T15:25:31Z)
Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models [16.721450557704767]
RefRank is a simple and effective comparative ranking method based on a fixed reference document.<n>We show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches.
arXiv Detail & Related papers (2025-06-13T04:03:09Z)
LGAI-EMBEDDING-Preview Technical Report [41.68404082385825]
This report presents a unified instruction-based framework for learning generalized text embeddings optimized for both information retrieval (IR) and non-IR tasks.<n>Our approach combines in-context learning, soft supervision, and adaptive hard-negative mining to generate context-aware embeddings.<n>Results show that our method achieves strong generalization and ranks among the top-performing models by Borda score.
arXiv Detail & Related papers (2025-06-09T05:30:35Z)
Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance [24.842839260409075]
In this work we forgo real training documents and annotations altogether.<n>We use open-source LLMs to directly generate synthetic documents that answer real user queries according to several different levels of relevance.<n> Experiments on various IR datasets show that our proposed approach outperforms conventional training with InfoNCE by a large margin.
arXiv Detail & Related papers (2025-03-29T22:33:22Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [58.617025733655005]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)<n>It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.<n>Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z)
Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy. We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results. We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.