Related papers: Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision

Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision

URL: http://arxiv.org/abs/2510.01229v1
Date: Tue, 23 Sep 2025 09:47:27 GMT
Title: Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision
Authors: Dimitar Peshevski, Kiril Blazhevski, Martin Popovski, Gjorgji Madjarov,
Abstract summary: Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning.<n>Fine-tuning smaller, task-specific models is a more efficient alternative but typically on scarce, manually labeled data.<n>We propose a novel pipeline that eliminates the need for human-labeled query-document pairs.
Score: 0.13999481573773073
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective document reranking is essential for improving search relevance across diverse applications. While Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning, their high computational cost makes them impractical for many real-world deployments. Fine-tuning smaller, task-specific models is a more efficient alternative but typically depends on scarce, manually labeled data. To overcome this, we propose a novel pipeline that eliminates the need for human-labeled query-document pairs. Our method uses LLMs to generate synthetic queries from domain-specific corpora and employs an LLM-based classifier to label positive and hard-negative pairs. This synthetic dataset is then used to fine-tune a smaller transformer model with contrastive learning using Localized Contrastive Estimation (LCE) loss. Experiments on the MedQuAD dataset show that our approach significantly boosts in-domain performance and generalizes well to out-of-domain tasks. By using LLMs for data generation and supervision rather than inference, we reduce computational costs while maintaining strong reranking capabilities.

Related papers

Leveraging the Power of Large Language Models in Entity Linking via Adaptive Routing and Targeted Reasoning [4.338036373287262]
ARTER presents a structured pipeline that achieves high performance without deep fine-tuning.<n>It strategically combines candidate generation, context-based scoring, adaptive routing, and selective reasoning.<n>On standard benchmarks, ARTER outperforms ReFinED by up to +4.47%, with an average gain of +2.53% on 5 out of 6 datasets.
arXiv Detail & Related papers (2025-10-23T00:50:14Z)
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks [49.2639069781367]
We propose GLiClass, a novel method that adapts the GLiNER architecture for sequence classification tasks.<n>Our approach achieves strong accuracy and efficiency comparable to embedding-based methods, while maintaining the flexibility needed for zero-shot and few-shot learning scenarios.
arXiv Detail & Related papers (2025-08-11T06:22:25Z)
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers [15.549112491482818]
Large Language Models (LLMs) have been applied to reranking tasks in information retrieval, achieving strong performance.<n>Existing studies evaluate the efficiency of LLM-based rerankers using proxy metrics such as latency, the number of forward passes, input tokens, and output tokens.<n>We propose Etextsuperscript2R-FLOPs, for LLM-based rerankers: ranking metrics per PetaFLOP (RPP) for relevance per compute and queries per PetaFLOP (QPP) for hardware-agnostic throughput.
arXiv Detail & Related papers (2025-07-08T17:56:28Z)
LLMs as Data Annotators: How Close Are We to Human Performance [47.61698665650761]
Manual annotation of data is labor-intensive, time-consuming, and costly.<n>In-context learning (ICL) in which some examples related to the task are given in the prompt can lead to inefficiencies and suboptimal model performance.<n>This paper presents experiments comparing several LLMs, considering different embedding models, across various datasets for the Named Entity Recognition (NER) task.
arXiv Detail & Related papers (2025-04-21T11:11:07Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [65.23593936798662]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Efficient Alignment of Large Language Models via Data Sampling [0.4915744683251149]
We propose an information theory-based methodology for efficient alignment by identifying a small high quality subset.<n>We find that the model aligned using our proposed methodology outperforms other sampling methods and performs comparable to the model aligned with the full dataset.
arXiv Detail & Related papers (2024-11-15T19:36:15Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Efficacy of Synthetic Data as a Benchmark [3.2968976262860408]
We investigate the effectiveness of generating synthetic data through large language models (LLMs) Our experiments show that while synthetic data can effectively capture performance of various methods for simpler tasks, it falls short for more complex tasks like named entity recognition. We propose a new metric called the bias factor, which evaluates the biases introduced when the same LLM is used to both generate benchmarking data and to perform the tasks.
arXiv Detail & Related papers (2024-09-18T13:20:23Z)
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data. Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers [42.84866455416052]
We develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods.
arXiv Detail & Related papers (2023-03-01T20:21:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.