Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation
- URL: http://arxiv.org/abs/2411.19297v1
- Date: Thu, 28 Nov 2024 18:09:49 GMT
- Title: Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation
- Authors: Son Thai Ly, Hien V. Nguyen,
- Abstract summary: We introduce FreqFit, a novel Frequency Fine-tuning module between ViT blocks to enhance model adaptability.
FreqFit consistently improves performance over the original PEFT methods with performance gains ranging from 1% to 16%.
- Score: 0.3683202928838613
- License:
- Abstract: Adapting vision transformer foundation models through parameter-efficient fine-tuning (PEFT) methods has become increasingly popular. These methods optimize a limited subset of parameters, enabling efficient adaptation without the need to fine-tune the entire model while still achieving competitive performance. However, traditional PEFT methods may limit the model's capacity to capture complex patterns, especially those associated with high-frequency spectra. This limitation becomes particularly problematic as existing research indicates that high-frequency features are crucial for distinguishing subtle image structures. To address this issue, we introduce FreqFit, a novel Frequency Fine-tuning module between ViT blocks to enhance model adaptability. FreqFit is simple yet surprisingly effective, and can be integrated with all existing PEFT methods to boost their performance. By manipulating features in the frequency domain, our approach allows models to capture subtle patterns more effectively. Extensive experiments on 24 datasets, using both supervised and self-supervised foundational models with various state-of-the-art PEFT methods, reveal that FreqFit consistently improves performance over the original PEFT methods with performance gains ranging from 1% to 16%. For instance, FreqFit-LoRA surpasses the performances of state-of-the-art baselines on CIFAR100 by more than 10% even without applying regularization or strong augmentation. For reproducibility purposes, the source code is available at https://github.com/tsly123/FreqFiT.
Related papers
- FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition [9.963966059349731]
FreqMixForemrV2 is built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions.
The proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters.
arXiv Detail & Related papers (2024-12-29T23:52:40Z) - Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning [41.19870454097444]
We propose a novel approach that models network weights by leveraging a combination of physical priors.
We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties.
MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters.
arXiv Detail & Related papers (2024-12-03T19:00:34Z) - Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks.
We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT)
DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z) - Federated Fine-Tuning of Foundation Models via Probabilistic Masking [11.192113661738764]
Foundation Models (FMs) have revolutionized machine learning with their adaptability and high performance across tasks.
Their integration into Federated Learning (FL) is challenging due to substantial communication overhead from their extensive parameterization.
We present DeltaMask, a novel method that efficiently fine-tunes FMs in FL at an ultra-low, well below 1 bpp.
arXiv Detail & Related papers (2023-11-29T01:10:39Z) - Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution [32.29219284419944]
Cross-refinement adaptive feature modulation transformer (CRAFT)
We introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency.
Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods.
arXiv Detail & Related papers (2023-08-09T15:38:36Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.