Related papers: When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs

When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs

URL: http://arxiv.org/abs/2601.18350v3
Date: Tue, 03 Feb 2026 02:46:48 GMT
Title: When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs
Authors: Junyi Zou,
Abstract summary: Large language models can exhibit surprising adapter interference when combining domain adaptation and instruction alignment.<n>We study a two-stage LoRA pipeline for medical LLMs, where domain-oriented pre-training (PT) and supervised fine-tuning (SFT) are trained separately and later merged.
Score: 0.6345523830122167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models can exhibit surprising adapter interference when combining domain adaptation and instruction alignment in safety-critical settings. We study a two-stage LoRA pipeline for medical LLMs, where domain-oriented pre-training (PT) and supervised fine-tuning (SFT) are trained separately and later merged through weighted adapter merging. We observe that introducing PT signal can systematically alter model behavior and produce reasoning-style outputs, even when evaluation templates explicitly attempt to suppress such behavior. This interference leads to a divergence between surface metrics and reasoning or alignment behavior: BLEU/ROUGE scores drop significantly, while multiple-choice accuracy improves. We further show that small pipeline mistakes can easily misattribute SFT-only behavior to merged models, and provide a lightweight merge-verification routine to ensure correctness and reproducibility. Our findings highlight an interaction between knowledge injection and instruction alignment in adapter-based fine-tuning, with important implications for safety-critical model deployment.

Related papers

Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
On the Paradoxical Interference between Instruction-Following and Task Solving [50.75960598434753]
Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
arXiv Detail & Related papers (2026-01-29T17:48:56Z)
MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z)
Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support [3.165122193962168]
Large language models (LLMs) have rapidly advanced in clinical decision-making.<n>Yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure.
arXiv Detail & Related papers (2025-12-18T22:29:45Z)
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning [6.778254993886297]
We introduce Fleming-R1, a model designed for verifiable medical reasoning through three complementary innovations.<n>First, our Reasoning-Oriented Data Strategy (RODS) combines curated medical QA datasets with knowledge-graph-guided synthesis.<n>Second, we employ Chain-of-Thought (CoT) cold start to distill high-quality reasoning trajectories from teacher models.<n>Third, we implement a two-stage Reinforcement Learning from Verifiable Rewards framework.
arXiv Detail & Related papers (2025-09-18T13:35:14Z)
IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection [11.178131621535261]
IAD-R1, a universal post-training framework, substantially enhances anomaly detection capabilities.<n>IAD-R1 achieves significant improvements across 7 Vision-Language Models (VLMs)<n>IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings.
arXiv Detail & Related papers (2025-08-07T09:34:45Z)
Learning from Heterogeneous Structural MRI via Collaborative Domain Adaptation for Late-Life Depression Assessment [24.340328016766183]
We propose a Collaborative Domain Adaptation framework for LLD detection using T1-weighted MRIs.<n>The framework consists of three stages: supervised training on labeled source data, self-supervised target feature adaptation and collaborative training on unlabeled target data.<n>Experiments conducted on multi-site T1-weighted MRI data demonstrate that the framework consistently outperforms state-of-the-art unsupervised domain adaptation methods.
arXiv Detail & Related papers (2025-07-30T01:38:32Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling [56.26834106704781]
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs)<n>We provide evidence supporting the presence of LLMs' internal compass that dictate the correctness of factual recall at the time of generation.<n>Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers.
arXiv Detail & Related papers (2025-05-27T16:24:02Z)
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs [8.085475675888045]
Activation steering provides an alternative for inference-time control.<n>We introduce a novel approach using a lightweight, trainable controller network integrated during inference.
arXiv Detail & Related papers (2025-05-22T01:48:38Z)
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging [11.708743111945727]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization.<n>Model merging offers a cost-effective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation.<n>We propose Optimal Brain Iterative Merging, a novel method designed to mitigate both intra-model and inter-model interference.
arXiv Detail & Related papers (2025-02-17T09:07:49Z)
DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation [2.708515419272247]
We propose a novel framework called Dual-Branch Evi-dential Deep Learning (DuEDL) Our method significantly enhances the reliability and generalization ability of the model without sacrificing accuracy, outper-forming state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-23T11:23:57Z)
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment. Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics. It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z)
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation [90.93999543169296]
GPT-4V acts as the most advanced publicly accessible multimodal foundation model. This study rigorously evaluates GPT-4V's adaptability and generalization capabilities in dynamic environments.
arXiv Detail & Related papers (2023-12-12T16:48:07Z)
Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection [13.801572236048601]
FOcus-the-Discrepancy (FOD) can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies. In this paper, we propose a novel AD framework: FOcus-the-Discrepancy (FOD), which can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies.
arXiv Detail & Related papers (2023-08-06T01:30:26Z)
Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback. We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Instrumental Variable Learning for Chest X-ray Classification [52.68170685918908]
We propose an interpretable instrumental variable (IV) learning framework to eliminate the spurious association and obtain accurate causal representation. Our approach's performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets.
arXiv Detail & Related papers (2023-05-20T03:12:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.