ConvLoRA and AdaBN based Domain Adaptation via Self-Training
- URL: http://arxiv.org/abs/2402.04964v1
- Date: Wed, 7 Feb 2024 15:43:50 GMT
- Title: ConvLoRA and AdaBN based Domain Adaptation via Self-Training
- Authors: Sidra Aleem, Julia Dietlmeier, Eric Arazo, Suzanne Little
- Abstract summary: We propose Convolutional Low-Rank Adaptation (ConvLoRA) for multi-target domain adaptation.
ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient.
Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks.
- Score: 4.006331916849688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing domain adaptation (DA) methods often involve pre-training on the
source domain and fine-tuning on the target domain. For multi-target domain
adaptation, having a dedicated/separate fine-tuned network for each target
domain, that retain all the pre-trained model parameters, is prohibitively
expensive. To address this limitation, we propose Convolutional Low-Rank
Adaptation (ConvLoRA). ConvLoRA freezes pre-trained model weights, adds
trainable low-rank decomposition matrices to convolutional layers, and
backpropagates the gradient through these matrices thus greatly reducing the
number of trainable parameters. To further boost adaptation, we utilize
Adaptive Batch Normalization (AdaBN) which computes target-specific running
statistics and use it along with ConvLoRA. Our method has fewer trainable
parameters and performs better or on-par with large independent fine-tuned
networks (with less than 0.9% trainable parameters of the total base model)
when tested on the segmentation of Calgary-Campinas dataset containing brain
MRI images. Our approach is simple, yet effective and can be applied to any
deep learning-based architecture which uses convolutional and batch
normalization layers. Code is available at:
https://github.com/aleemsidra/ConvLoRA.
Related papers
- On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy [5.359060261460183]
Low-rank task adaptation of language models has been proposed, e.g., LoRA and FLoRA.
We look at low-rank adaptation from the lens of data privacy.
Unlike other existing fine-tuning algorithms, low-rank adaptation provides privacy w.r.t the fine-tuning data implicitly.
arXiv Detail & Related papers (2024-09-26T04:56:49Z) - SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead.
In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD.
Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models [34.1111413429869]
BiLoRA is an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO)
tested on ten datasets covering natural language understanding and generation tasks.
arXiv Detail & Related papers (2024-03-19T14:11:20Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - TADA: Efficient Task-Agnostic Domain Adaptation for Transformers [3.9379577980832843]
In this work, we introduce TADA, a novel task-agnostic domain adaptation method.
Within TADA, we retrain embeddings to learn domain-aware input representations and tokenizers for the transformer encoder.
We conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases.
arXiv Detail & Related papers (2023-05-22T04:53:59Z) - Adapting the Mean Teacher for keypoint-based lung registration under
geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data.
We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain.
Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z) - Fire Together Wire Together: A Dynamic Pruning Approach with
Self-Supervised Mask Prediction [12.86325214182021]
Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment.
Current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss.
We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet.
arXiv Detail & Related papers (2021-10-15T17:39:53Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z) - Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them.
We propose a generic framework based on graph embedding.
We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.