BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
- URL: http://arxiv.org/abs/2508.06781v1
- Date: Sat, 09 Aug 2025 02:15:17 GMT
- Title: BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
- Authors: Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro, Aaron Courville, Perouz Taslakian,
- Abstract summary: BiXSE is a pointwise training method that optimize binary cross-entropy over graded relevance scores.<n>It achieves strong performance with reduced annotation and compute costs.<n>BiXSE offers a robust, scalable alternative for training dense retrieval models.
- Score: 6.272555849379284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural sentence embedding models for dense retrieval typically rely on binary relevance labels, treating query-document pairs as either relevant or irrelevant. However, real-world relevance often exists on a continuum, and recent advances in large language models (LLMs) have made it feasible to scale the generation of fine-grained graded relevance labels. In this work, we propose BiXSE, a simple and effective pointwise training method that optimizes binary cross-entropy (BCE) over LLM-generated graded relevance scores. BiXSE interprets these scores as probabilistic targets, enabling granular supervision from a single labeled query-document pair per query. Unlike pairwise or listwise losses that require multiple annotated comparisons per query, BiXSE achieves strong performance with reduced annotation and compute costs by leveraging in-batch negatives. Extensive experiments across sentence embedding (MMTEB) and retrieval benchmarks (BEIR, TREC-DL) show that BiXSE consistently outperforms softmax-based contrastive learning (InfoNCE), and matches or exceeds strong pairwise ranking baselines when trained on LLM-supervised data. BiXSE offers a robust, scalable alternative for training dense retrieval models as graded relevance supervision becomes increasingly accessible.
Related papers
- IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation [85.56193980646981]
We propose IF-RewardBench, a comprehensive meta-evaluation benchmark for instruction-following.<n>For each instruction, we construct a preference graph containing all pairwise preferences among multiple responses.<n>Experiments on IF-RewardBench reveal significant deficiencies in current judge models.
arXiv Detail & Related papers (2026-03-05T02:21:17Z) - GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning [26.616849067985967]
Groupwise is a novel reranking paradigm for large language models.<n>We propose an innovative pipeline for high quality retrieval and ranking data.<n>The resulting data can be leveraged not only for training the reranker but also for training the retriever.
arXiv Detail & Related papers (2025-11-10T15:25:31Z) - Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models [16.721450557704767]
RefRank is a simple and effective comparative ranking method based on a fixed reference document.<n>We show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches.
arXiv Detail & Related papers (2025-06-13T04:03:09Z) - LGAI-EMBEDDING-Preview Technical Report [41.68404082385825]
This report presents a unified instruction-based framework for learning generalized text embeddings optimized for both information retrieval (IR) and non-IR tasks.<n>Our approach combines in-context learning, soft supervision, and adaptive hard-negative mining to generate context-aware embeddings.<n>Results show that our method achieves strong generalization and ranks among the top-performing models by Borda score.
arXiv Detail & Related papers (2025-06-09T05:30:35Z) - Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance [24.842839260409075]
In this work we forgo real training documents and annotations altogether.<n>We use open-source LLMs to directly generate synthetic documents that answer real user queries according to several different levels of relevance.<n> Experiments on various IR datasets show that our proposed approach outperforms conventional training with InfoNCE by a large margin.
arXiv Detail & Related papers (2025-03-29T22:33:22Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [58.617025733655005]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)<n>It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.<n>Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.