Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning
- URL: http://arxiv.org/abs/2602.11215v1
- Date: Wed, 11 Feb 2026 09:04:13 GMT
- Title: Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning
- Authors: Lintao Wang, Zhuqiang Lu, Yilin Zhu, Kun Hu, Zhenfei Yin, Shixiang Tang, Zhiyong Wang, Wanli Ouyang, Xinzhu Ma,
- Abstract summary: We present the first systematic study of multi-disciplinary large language models (LLMs) fine-tuning.<n>We analyze learning patterns of full fine-tuning, LoRA, LoRA-MoE, and LoRA compositions.<n>Our study shows that multi-disciplinary learning is substantially more variable than single-discipline training.
- Score: 74.77190207677863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) have achieved strong performance through fine-tuning within individual scientific domains, their learning dynamics in multi-disciplinary contexts remains poorly understood, despite the promise of improved generalization and broader applicability through cross-domain knowledge synergy. In this work, we present the first systematic study of multi-disciplinary LLM fine-tuning, constructing a five-discipline corpus and analyzing learning patterns of full fine-tuning, LoRA, LoRA-MoE, and LoRA compositions. Particularly, our study shows that multi-disciplinary learning is substantially more variable than single-discipline training and distills four consistent empirical laws: (1) Balance-then-Diversity: low-resource disciplines degrade performance unless mitigated via diversity-aware upsampling; (2) Merge-then-Align: restoring instruction-following ability is critical for cross-discipline synergy; (3) Optimize-then-Scale: parameter scaling offers limited gains without prior design optimization; and (4) Share-then-Specialize: asymmetric LoRA-MoE yields robust gains with minimal trainable parameters via shared low-rank projection. Together, these laws form a practical recipe for principled multi-discipline fine-tuning and provide actionable guidance for developing generalizable scientific LLMs.
Related papers
- ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation [54.071574153853994]
ProRAG is a process-supervised reinforcement learning framework designed to integrate learned step-level supervision into the online optimization loop.<n>Our framework consists of four stages: (1) Supervised Policy Warmup to initialize the model with a structured reasoning format; (2) construction of an MCTS-based Process Reward Model (PRM) to quantify intermediate reasoning quality; (3) PRM-Guided Reasoning Refinement to align the policy with fine-grained process preferences; and (4) Process-Supervised Reinforcement Learning with a dual-granularity advantage mechanism.
arXiv Detail & Related papers (2026-01-29T16:04:59Z) - RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models [13.343944091570386]
Large Language Models (LLMs) can propose rules in natural language, sidestepping the need for a predefined predicate space in traditional rule learning.<n>We present RLIE, a unified framework that integrates LLMs with probabilistic modeling to learn a set of weighted rules.<n>Applying rules directly with their learned weights yields superior performance, whereas prompting LLMs with the rules, weights, and logistic-model outputs surprisingly degrades accuracy.
arXiv Detail & Related papers (2025-10-22T15:50:04Z) - VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z) - Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning [32.665418383317224]
We present a systematic investigation of multi-domain reasoning within the RLVR framework.<n>We focus on three primary domains: mathematical reasoning, code generation, and logical puzzle solving.<n>Our results offer significant insights into the dynamics governing domain interactions.
arXiv Detail & Related papers (2025-07-23T13:51:04Z) - Understanding Large Language Models' Ability on Interdisciplinary Research [27.539601507270575]
Large Language Models (LLMs) are powerful tools and collaborators in scientific discovery.<n>The lack of a dedicated benchmark that evaluates LLMs' ability to develop ideas in Interdisciplinary Research poses a critical barrier to fully understanding their strengths and limitations.<n>We introduce IDRBench -- a pioneering benchmark featuring an expert annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities.
arXiv Detail & Related papers (2025-07-21T15:43:05Z) - Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling [66.0871543682453]
We present Omni-Thinker, a unified reinforcement learning framework that scales large language models across diverse tasks.<n>Our scheduler orders tasks according to accuracy backward transfer (BWT), reducing forgetting and improving multi-task performance.
arXiv Detail & Related papers (2025-07-20T01:50:16Z) - Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation [106.17986469245302]
Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
arXiv Detail & Related papers (2025-06-03T09:01:08Z) - EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning [6.675088737484839]
We introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives to improve efficiency and flexibility.<n>Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives.<n>We demonstrate the advantages of EMORL against existing baselines in experiments on the PAIR and Psych8k datasets.
arXiv Detail & Related papers (2025-05-05T11:30:46Z) - Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach [15.960271016276447]
We present a systematic evaluation of mathematical reasoning abilities across eight leading Large Language Models (LLMs)<n>Our analyses reveal several key findings: DeepSeek-R1 performs competitively with o1 across most domains and achieves the highest accuracy on the MMLU Formal Logic benchmark.<n>We explore how architectural choices, training paradigms, and optimization strategies contribute to variation in reasoning performance.
arXiv Detail & Related papers (2025-03-13T17:23:45Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Exploring Accuracy-Fairness Trade-off in Large Language Models [10.5817207739373]
We study the intricate challenge of harmonising accuracy and fairness in the enhancement of Large Language Models.
Overemphasising optimisation of one metric invariably leads to a significant degradation of the other.
Our investigation reveals that multi-objective evolutionary learning (MOEL) methodologies offer promising avenues for tackling this challenge.
arXiv Detail & Related papers (2024-11-21T04:40:35Z) - MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities.<n> MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge.<n>This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.