Related papers: ConvLoRA and AdaBN based Domain Adaptation via Self-Training

ConvLoRA and AdaBN based Domain Adaptation via Self-Training

URL: http://arxiv.org/abs/2402.04964v1
Date: Wed, 7 Feb 2024 15:43:50 GMT
Title: ConvLoRA and AdaBN based Domain Adaptation via Self-Training
Authors: Sidra Aleem, Julia Dietlmeier, Eric Arazo, Suzanne Little
Abstract summary: We propose Convolutional Low-Rank Adaptation (ConvLoRA) for multi-target domain adaptation. ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient. Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks.
Score: 4.006331916849688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing domain adaptation (DA) methods often involve pre-training on the source domain and fine-tuning on the target domain. For multi-target domain adaptation, having a dedicated/separate fine-tuned network for each target domain, that retain all the pre-trained model parameters, is prohibitively expensive. To address this limitation, we propose Convolutional Low-Rank Adaptation (ConvLoRA). ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient through these matrices thus greatly reducing the number of trainable parameters. To further boost adaptation, we utilize Adaptive Batch Normalization (AdaBN) which computes target-specific running statistics and use it along with ConvLoRA. Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks (with less than 0.9% trainable parameters of the total base model) when tested on the segmentation of Calgary-Campinas dataset containing brain MRI images. Our approach is simple, yet effective and can be applied to any deep learning-based architecture which uses convolutional and batch normalization layers. Code is available at: https://github.com/aleemsidra/ConvLoRA.

Related papers

Gradient-based Fine-Tuning through Pre-trained Model Regularization [20.823624386591902]
We propose an efficient gradient-based and regularized fine-tuning method (GRFT) that updates the rows or columns of the weight matrix.<n> GRFT achieves state-of-the-art performance, surpassing existing methods such as GPS, Adapter Tuning, and LoRA.
arXiv Detail & Related papers (2025-06-14T14:41:03Z)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z)
Unsupervised Parameter Efficient Source-free Post-pretraining [52.27955794126508]
We introduce UpStep, an Unsupervised. Source-free post-pretraining approach to adapt a base model from a source domain to a target domain. We use various general backbone architectures, both supervised and unsupervised, trained on Imagenet as our base model.
arXiv Detail & Related papers (2025-02-28T18:54:51Z)
GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts [58.95913531746308]
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training. We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
arXiv Detail & Related papers (2025-02-15T10:10:49Z)
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation [13.585425242072173]
Most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA) We propose to improve LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition (SVD) on minibatches of activation. We call our new method $textbfE$xplained $textbfV$ariance $textbfA$daptation (EVA)
arXiv Detail & Related papers (2024-10-09T17:59:06Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method. We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation. Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy [5.359060261460183]
Low-rank task adaptation of language models has been proposed, e.g., LoRA and FLoRA. We look at low-rank adaptation from the lens of data privacy. Unlike other existing fine-tuning algorithms, low-rank adaptation provides privacy w.r.t the fine-tuning data implicitly.
arXiv Detail & Related papers (2024-09-26T04:56:49Z)
SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead. In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD. Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models [34.1111413429869]
BiLoRA is an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO) tested on ten datasets covering natural language understanding and generation tasks.
arXiv Detail & Related papers (2024-03-19T14:11:20Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
TADA: Efficient Task-Agnostic Domain Adaptation for Transformers [3.9379577980832843]
In this work, we introduce TADA, a novel task-agnostic domain adaptation method. Within TADA, we retrain embeddings to learn domain-aware input representations and tokenizers for the transformer encoder. We conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases.
arXiv Detail & Related papers (2023-05-22T04:53:59Z)
Adapting the Mean Teacher for keypoint-based lung registration under geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data. We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain. Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z)
Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction [12.86325214182021]
Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. Current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet.
arXiv Detail & Related papers (2021-10-15T17:39:53Z)
LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them. We propose a generic framework based on graph embedding. We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.