Related papers: Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

URL: http://arxiv.org/abs/2406.09679v1
Date: Fri, 14 Jun 2024 03:04:05 GMT
Title: Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
Authors: Yuhang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang,
Abstract summary: We leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training. We introduce two variants of MoLA, namely, MoLA-Grad and MoLA- SJ, to respectively handle the target-aware and target-agnostic scenarios. The latter uses a novel Task-wise Decorrelation (TwD) to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks.
Score: 36.09178055533487
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation (TwD) loss to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks, achieving similar effects. We conduct comprehensive experiments to verify the superiority of MoLA over previous state-of-the-art methods and present in-depth analysis on its working mechanism. Source code is available at: https://github.com/MediaBrain-SJTU/MoLA

Related papers

MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning [39.892628170627496]
Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data.<n> prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks.<n>We propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions.
arXiv Detail & Related papers (2025-05-21T03:06:10Z)
Generative Trajectory Stitching through Diffusion Composition [29.997765496994457]
CompDiffuser is a novel generative approach that can solve new tasks by learning to compositionally stitch together shorter trajectory chunks from previously seen tasks. We conduct experiments on benchmark tasks of various difficulties, covering different environment sizes, agent state dimension, trajectory types, training data quality, and show that CompDiffuser significantly outperforms existing methods.
arXiv Detail & Related papers (2025-03-07T05:22:52Z)
Ensembles of Low-Rank Expert Adapters [9.599957499802446]
We propose the Ensembles of Low-Rank Expert Adapters (ELREA) framework to improve the model's capability to handle diverse tasks. ELREA clusters the training instructions based on their gradient directions, representing different areas of expertise. During inference, ELREA combines predictions from the most relevant expert adapters based on the input data's gradient similarity to the training clusters.
arXiv Detail & Related papers (2025-01-31T18:07:21Z)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion. Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z)
Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning [55.384428765798496]
New data exhibits a long-tailed distribution, such as e-commerce platform reviews. This necessitates continuous model learning imbalanced data without forgetting. We introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.
arXiv Detail & Related papers (2024-09-11T17:52:00Z)
Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task. We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE) In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z)
Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models [0.2999888908665658]
Federated Learning (FL) has become an attractive approach to collaboratively train Machine Learning (ML) models. This work proposes a novel unsupervised FL approach for the identification of potential misbehavior in vehicular environments. We leverage the computing capabilities of public cloud services for model aggregation purposes.
arXiv Detail & Related papers (2024-05-16T08:49:50Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM) ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm. Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework. It works as both a decentralized policy and a centralized controller. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
Adaptive Parameterization of Deep Learning Models for Federated Learning [85.82002651944254]
Federated Learning offers a way to train deep neural networks in a distributed fashion. It incurs a communication overhead as the model parameters or gradients need to be exchanged regularly during training. In this paper, we propose to utilise parallel Adapters for Federated Learning.
arXiv Detail & Related papers (2023-02-06T17:30:33Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned. We propose and compare several candidate task-assigning mappers which require very little memory overhead. Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z)
Imbalanced Data Learning by Minority Class Augmentation using Capsule Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods. In our model, generative and discriminative networks play a novel competitive game. The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.