Related papers: Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging

Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging

URL: http://arxiv.org/abs/2602.04731v1
Date: Wed, 04 Feb 2026 16:36:00 GMT
Title: Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging
Authors: Sameh Khattab, Jean-Philippe Corbeil, Osman Alperen Koraş, Amin Dada, Julian Friedrich, François Beaulieu, Paul Vozila, Jens Kleesiek,
Abstract summary: Retrieval-augmented generation (RAG) has become the backbone of grounding Large Language Models (LLMs)<n>We present Synthesize-Merge-Train (STM), a modular framework that enhances decoder-only LLMs with synthetic hard negatives, retrieval prompt optimization, and model merging.<n> Experiments on a subset of 12 medical and general tasks from the MTEB benchmark show STM boosts task-specific experts by up to 23.5% (average 7.5%) and produces merged models that outperform both single experts and strong baselines without extensive pretraining.
Score: 6.761707146228743
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented generation (RAG) has become the backbone of grounding Large Language Models (LLMs), improving knowledge updates and reducing hallucinations. Recently, LLM-based retriever models have shown state-of-the-art performance for RAG applications. However, several technical aspects remain underexplored on how to adapt general-purpose LLMs into effective domain-specific retrievers, especially in specialized domains such as biomedicine. We present Synthesize-Train-Merge (STM), a modular framework that enhances decoder-only LLMs with synthetic hard negatives, retrieval prompt optimization, and model merging. Experiments on a subset of 12 medical and general tasks from the MTEB benchmark show STM boosts task-specific experts by up to 23.5\% (average 7.5\%) and produces merged models that outperform both single experts and strong baselines without extensive pretraining. Our results demonstrate a scalable, efficient path for turning general LLMs into high-performing, domain-specialized retrievers, preserving general-domain capabilities while excelling on specialized tasks.

Related papers

Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM [51.21051698747157]
We propose a self-adaptive gradient-aware data selection approach (GrADS) for supervised fine-tuning of large language models (LLMs)<n>Specifically, we design self-guided criteria that leverage the magnitude and statistical distribution of gradients to prioritize examples that contribute the most to the model's learning process.<n>Through extensive experimentation with various LLMs across diverse domains such as medicine, law, and finance, GrADS has demonstrated significant efficiency and cost-effectiveness.
arXiv Detail & Related papers (2025-11-07T08:34:50Z)
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM [32.938753667649074]
Large language models (LLMs) are increasingly used as automated designers for vehicle routing problems (VRPs)<n>This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafteds within advanced solvers.<n>We propose RFTHGS, a novel Reinforcement learning framework for Fine-Tuning a small LLM to generate high-performance crossover operators.
arXiv Detail & Related papers (2025-10-13T08:08:58Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Domain-Aware RAG: MoL-Enhanced RL for Efficient Training and Scalable Retrieval [5.640810636056805]
MoLER is a domain-aware RAG method that uses MoL-Enhanced Reinforcement Learning to optimize retrieval.<n>MoLER bridges the knowledge gap in RAG systems, enabling robust and scalable retrieval in specialized domains.
arXiv Detail & Related papers (2025-09-08T13:04:07Z)
How Do LLM-Generated Texts Impact Term-Based Retrieval Models? [76.92519309816008]
This paper investigates the influence of large language models (LLMs) on term-based retrieval models.<n>Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes.<n>Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries.
arXiv Detail & Related papers (2025-08-25T06:43:27Z)
RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation [45.679455112940175]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved at inference time.<n>We evaluated RAG systems using MassiveDS, a large-scale datastore with mixture of knowledge, and identified critical limitations.
arXiv Detail & Related papers (2025-07-26T20:57:24Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [65.23593936798662]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains [45.349645606978434]
Retrieval-augmented generation (RAG) enhances the question-answering abilities of large language models (LLMs)<n>We propose SimRAG, a self-training approach that equips the LLM with joint capabilities of question answering and question generation for domain adaptation.<n> Experiments on 11 datasets, spanning two backbone sizes and three domains, demonstrate that SimRAG outperforms baselines by 1.2%--8.6%.
arXiv Detail & Related papers (2024-10-23T15:24:16Z)
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z)
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT [0.0]
ARCoT (Adaptable Retrieval-based Chain of Thought) is a framework designed to enhance the domain-specific accuracy of Large Language Models (LLMs) Our model outperformed standard LLMs and reported average human performance, demonstrating improvements of up to 68%.
arXiv Detail & Related papers (2024-05-17T18:31:38Z)
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts [74.40198929049959]
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. generalist LMMs often suffer from performance degradation when tuned over a large collection of tasks. We propose Omni-SMoLA, an architecture that uses the Soft MoE approach to mix many multimodal low rank experts.
arXiv Detail & Related papers (2023-12-01T23:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.