Related papers: Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

URL: http://arxiv.org/abs/2502.06876v2
Date: Thu, 13 Feb 2025 06:28:33 GMT
Title: Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Authors: Jinluan Yang, Dingnan Jin, Anke Tang, Li Shen, Didi Zhu, Zhengyu Chen, Daixin Wang, Qing Cui, Zhiqiang Zhang, Jun Zhou, Fei Wu, Kun Kuang,
Abstract summary: This paper establishes the first comprehensive benchmark for model merging in large language models (LLMs)<n>Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation.<n>We propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter
Score: 35.53877806259048
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI, with existing methods like data mixture strategies facing limitations including reliance on expert knowledge and conflicting optimization signals. While model merging offers a promising alternative by integrating specialized models, its potential for 3H optimization remains underexplored. This paper establishes the first comprehensive benchmark for model merging in 3H-aligned LLMs, systematically evaluating 15 methods (12 training-free merging and 3 data mixture techniques) across 10 datasets associated with 5 annotation dimensions, 2 LLM families, and 2 training paradigms. Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation. Building on these findings, we propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter distribution and sparsity for LLMs, further improving LLM alignment across multiple evaluations. We release our trained models for further exploration.

Related papers

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models [7.61977883644433]
We propose PRRC to evaluate data quality across Professionalism, Readability, Reasoning, and Cleanliness. We introduce Meta-rater, a multi-dimensional data selection method that integrates these dimensions with existing quality metrics through learned optimal weightings. Experiments demonstrate that Meta-rater doubles convergence speed for 1.3B parameter models and improves downstream task performance by 3.23, with scalable benefits observed in 3.3B models trained on 100B tokens.
arXiv Detail & Related papers (2025-04-19T06:12:33Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach [0.0]
LEWIS (Layer Wise Sparsity) is a guided model-merging framework. It guides existing merging methods by preserving essential layer-wise task-specific knowledge. Experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models.
arXiv Detail & Related papers (2025-03-05T20:09:59Z)
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation [15.47711837051754]
We propose Mixup Model Merge, an innovative approach inspired by the Mixup data augmentation technique. M$3$ is a simple yet effective model merging method that significantly enhances the performance of the merged model.
arXiv Detail & Related papers (2025-02-21T13:01:26Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs) RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion [35.56060538535215]
This paper explores strategies to integrate multiple domain-specialized models into an efficient pivot model.<n>We propose two fusion strategies to combine the strengths of multiple LLMs.<n>We achieve accuracy improvements of 9.27%, 8.80%, and 8.89% on the GSM8K, MATH, and HumanEval tasks, respectively.
arXiv Detail & Related papers (2025-01-06T06:29:55Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM. We propose a reasoning-decision alignment constraint between the paired CoTs and planning results. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Progressively Label Enhancement for Large Language Model Alignment [42.01694160556464]
Large Language Models (LLM) alignment aims to prevent models from producing content that misaligns with human expectations. We propose PLE, a framework that dynamically adjusts the model's training process based on the evolving quality of the generated data.
arXiv Detail & Related papers (2024-08-05T16:21:17Z)
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch [70.614652904151]
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model. Current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment.
arXiv Detail & Related papers (2024-06-20T17:59:58Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
RoCourseNet: Distributionally Robust Training of a Prediction Aware Recourse Model [29.057300578765663]
RoCourseNet is a training framework that jointly optimize predictions and recourses that are robust to future data shifts. We show that RoCourseNet consistently achieves more than 96% robust validity and outperforms state-of-the-art baselines by at least 10% in generating robust explanations.
arXiv Detail & Related papers (2022-06-01T18:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.