CAD: Memory Efficient Convolutional Adapter for Segment Anything
- URL: http://arxiv.org/abs/2409.15889v1
- Date: Tue, 24 Sep 2024 09:02:23 GMT
- Title: CAD: Memory Efficient Convolutional Adapter for Segment Anything
- Authors: Joohyeok Kim, Joonhyeon Song, Seohwan Yun, Seongho Yoon, Sangmin Lee,
- Abstract summary: Foundation model for image segmentation, Segment Anything (SAM) has been actively researched in various fields.
adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements.
This paper proposes a memory-efficient parallel convolutional adapter architecture.
- Score: 3.760646312664378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Foundation model for image segmentation, Segment Anything (SAM), has been actively researched in various fields since its proposal. Various researches have been proposed to adapt SAM to specific domains, with one notable approach involving the addition and training of lightweight adapter modules. While adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements, they face a often overlooked issue: the excessive consumption of GPU memory relative to the number of trainable parameters. Addressing this issue, this paper proposes a memory-efficient parallel convolutional adapter architecture. This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training. Our proposed architecture demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter, indicating its value as an alternative to simple decoder fine-tuning when hardware limitations preclude adapter-based learning. Our code implementation is available at our github.
Related papers
- Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection [22.748835458594744]
We propose a new framework for adapter reuse that moves beyond retrieval.<n>We represent each task by a latent prototype vector and aim to approximate the target task prototype as a sparse linear combination of retrieved reference prototypes.<n>The resulting combination weights are then used to blend the corresponding LoRA adapters, yielding a composite adapter tailored to the target task.
arXiv Detail & Related papers (2024-10-13T16:28:38Z) - AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval [14.009257997448634]
This work investigates the problem of instance-level image retrieval re-ranking with the constraint of memory efficiency.
The proposed model uses a transformer-based architecture designed to estimate image-to-image similarity.
Results on standard benchmarks demonstrate the superiority of our approach over both hand-crafted and learned models.
arXiv Detail & Related papers (2024-08-06T16:29:51Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model [90.26396410706857]
This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks.
CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters.
Cat-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup.
arXiv Detail & Related papers (2024-02-06T02:00:18Z) - RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything [117.02741621686677]
This work explores a novel real-time segmentation setting called real-time multi-purpose segmentation.
It contains three fundamental sub-tasks: interactive segmentation, panoptic segmentation, and video instance segmentation.
We present a novel dynamic convolution-based method, Real-Time Multi-Purpose SAM (RMP-SAM)
It contains an efficient encoder and an efficient decoupled adapter to perform prompt-driven decoding.
arXiv Detail & Related papers (2024-01-18T18:59:30Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Bridging Vision and Language Encoders: Parameter-Efficient Tuning for
Referring Image Segmentation [72.27914940012423]
We do an investigation of efficient tuning problems on referring image segmentation.
We propose a novel adapter called Bridger to facilitate cross-modal information exchange.
We also design a lightweight decoder for image segmentation.
arXiv Detail & Related papers (2023-07-21T12:46:15Z) - Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation [8.16038976420041]
We present a meta-learned Multiscale Memory Comparator (MMC) for few-shot video segmentation.
Unlike previous work, we instead preserve the detailed feature maps during across scale information exchange.
Our approach outperforms the baseline and yields state-of-the-art performance.
arXiv Detail & Related papers (2023-07-15T14:21:58Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Towards Efficient Visual Adaption via Structural Re-parameterization [76.57083043547296]
We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter.
RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
arXiv Detail & Related papers (2023-02-16T06:14:15Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.