Tangent Model Composition for Ensembling and Continual Fine-tuning
        - URL: http://arxiv.org/abs/2307.08114v2
- Date: Sat, 30 Sep 2023 02:37:27 GMT
- Title: Tangent Model Composition for Ensembling and Continual Fine-tuning
- Authors: Tian Yu Liu and Stefano Soatto
- Abstract summary: Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point.
TMC improves accuracy by 4.2% compared to ensembling non-linearly fine-tuned models.
- Score: 69.92177580782929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Tangent Model Composition (TMC) is a method to combine component models
independently fine-tuned around a pre-trained point. Component models are
tangent vectors to the pre-trained model that can be added, scaled, or
subtracted to support incremental learning, ensembling, or unlearning.
Component models are composed at inference time via scalar combination,
reducing the cost of ensembling to that of a single model. TMC improves
accuracy by 4.2% compared to ensembling non-linearly fine-tuned models at a
2.5x to 10x reduction of inference cost, growing linearly with the number of
component models. Each component model can be forgotten at zero cost, with no
residual effect on the resulting inference. When used for continual
fine-tuning, TMC is not constrained by sequential bias and can be executed in
parallel on federated data. TMC outperforms recently published continual
fine-tuning methods almost uniformly on each setting -- task-incremental,
class-incremental, and data-incremental -- on a total of 13 experiments across
3 benchmark datasets, despite not using any replay buffer. TMC is designed for
composing models that are local to a pre-trained embedding, but could be
extended to more general settings. The code is available at:
https://github.com/tianyu139/tangent-model-composition
 
      
        Related papers
        - Curvature Tuning: Provable Training-free Model Steering From a Single   Parameter [13.412573082645096]
 We show how a single parameter can be used to modulate the curvature of a model's decision boundary.
This makes CT both more efficient and interpretable than conventional fine-tuning methods.
We empirically validate its effectiveness in improving generalization and robustness of pretrained models.
 arXiv  Detail & Related papers  (2025-02-11T18:59:57Z)
- Stable Consistency Tuning: Understanding and Improving Consistency   Models [40.2712218203989]
 Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
 consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
 arXiv  Detail & Related papers  (2024-10-24T17:55:52Z)
- Truncated Consistency Models [57.50243901368328]
 Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
 arXiv  Detail & Related papers  (2024-10-18T22:38:08Z)
- Decouple-Then-Merge: Towards Better Training for Diffusion Models [45.89372687373466]
 Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption.
This work proposes a Decouple-then-Merge (DeMe) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps.
 arXiv  Detail & Related papers  (2024-10-09T08:19:25Z)
- ModelMix: A New Model-Mixup Strategy to Minimize Vicinal Risk across   Tasks for Few-scribble based Cardiac Segmentation [32.19827368497988]
 We introduce a new approach to few-scribble supervised segmentation based on model parameter, termed as ModelMix.
ModelMix constructs virtual models using convex combinations of convolutional parameters from separate encoders.
We then regularize the model set to minimize vicinal risk across tasks in both unsupervised and scribble-supervised way.
 arXiv  Detail & Related papers  (2024-06-19T05:58:11Z)
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
 We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
 arXiv  Detail & Related papers  (2024-05-23T05:25:45Z)
- MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
 MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
 arXiv  Detail & Related papers  (2023-10-11T17:57:14Z)
- Efficient GPT Model Pre-training using Tensor Train Matrix
  Representation [65.96485282393361]
 Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
 arXiv  Detail & Related papers  (2023-06-05T08:38:25Z)
- Revisiting Class-Incremental Learning with Pre-Trained Models:   Generalizability and Adaptivity are All You Need [84.3507610522086]
 Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
 arXiv  Detail & Related papers  (2023-03-13T17:59:02Z)
- Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
 Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
 arXiv  Detail & Related papers  (2020-06-12T14:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.