Related papers: A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

URL: http://arxiv.org/abs/2410.13841v1
Date: Thu, 17 Oct 2024 17:56:53 GMT
Title: A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
Authors: Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun,
Abstract summary: Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks. We introduce extensions to existing techniques like DARE and BitDelta to enhance the applicability and effectiveness of delta parameter editing in post-trained models.
Score: 45.82689769685688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i.e., the disparity between post-trained and pre-trained parameters). While numerous studies have explored delta parameter properties via operations like pruning, quantization, low-rank approximation, and extrapolation, a unified framework for systematically examining these characteristics has been lacking. In this paper, we propose a novel perspective based on Riemann sum approximation of the loss function to elucidate delta parameter editing operations. Our analysis categorizes existing methods into three classes based on their post-editing performance: competitive, decreased, and improved, explaining how they are expressed by the Riemann sum approximation term and how they alter the model performance. Extensive experiments on both visual and language models, including ViT, LLaMA 3, Qwen 2, and Mistral, corroborate our theoretical findings. Furthermore, we introduce extensions to existing techniques like DARE and BitDelta, highlighting their limitations in leveraging the properties of delta parameters and reorganizing them into general expressions to enhance the applicability and effectiveness of delta parameter editing in post-trained models.

Related papers

Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling [1.629803445577911]
Continuous-time deep learning models, such as neural ordinary differential equations (ODEs), offer a promising framework for surrogate modeling of complex physical systems.<n>A central challenge in training these models lies in learning yet stable time-varying weights, particularly under computational constraints.<n>This work investigates weight parameterization strategies that constrain temporal evolution of weights to a low-dimensional subspace spanned by basis functions.
arXiv Detail & Related papers (2025-07-29T17:49:43Z)
Advantageous Parameter Expansion Training Makes Better Large Language Models [50.82647159657912]
A subset of parameters, termed advantageous parameters, plays a crucial role in determining model performance.<n>We propose Advantageous EXpansion Training (APEX), a method that progressively expands advantageous parameters into the space of disadvantageous ones.<n>APEX achieves the same perplexity level as conventional training with just 33% of the training data, and yields significant improvements on downstream tasks.
arXiv Detail & Related papers (2025-05-30T06:06:23Z)
Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior [25.975757048963413]
Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation.<n>We present ExPLAIND, a unified framework that integrates all three perspectives.
arXiv Detail & Related papers (2025-05-26T14:53:11Z)
Image Segmentation via Variational Model Based Tailored UNet: A Deep Variational Framework [6.146992603795658]
We propose Variational Model Based Tailored UNet (VM_TUNet) for image segmentation.<n>VM_TUNet combines the interpretability and edge-preserving properties of variational methods with the adaptive feature learning of neural networks.<n>We show that VM_TUNet achieves superior segmentation performance compared to existing approaches.
arXiv Detail & Related papers (2025-05-09T05:50:22Z)
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation [10.48108719012248]
We focus on Knowledge Distillation (KD), where a compact student model is trained to mimic a larger teacher model. In contrast to much of the previous work, we scale up the parameters of the student model during training.
arXiv Detail & Related papers (2024-11-10T12:40:59Z)
Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
Self-supervised Pretraining for Partial Differential Equations [0.0]
We describe a novel approach to building a neural PDE solver leveraging recent advances in transformer based neural network architectures. Our model can provide solutions for different values of PDE parameters without any need for retraining the network.
arXiv Detail & Related papers (2024-07-03T16:39:32Z)
Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth. We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity. Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z)
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models [90.24999406296867]
In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full- parameter fine-tuning.
arXiv Detail & Related papers (2022-03-14T07:56:32Z)
Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.