HRSAM: Efficient Interactive Segmentation in High-Resolution Images
- URL: http://arxiv.org/abs/2407.02109v2
- Date: Sat, 23 Nov 2024 01:44:00 GMT
- Title: HRSAM: Efficient Interactive Segmentation in High-Resolution Images
- Authors: You Huang, Wenbin Lai, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji,
- Abstract summary: Segment Anything Model (SAM) has advanced interactive segmentation but is limited by the high computational cost on high-resolution images.
We focus on visual length extrapolation and propose a lightweight model named HRSAM.
The extrapolation enables HRSAM trained on low resolutions to generalize to high resolutions.
- Score: 59.537068118473066
- License:
- Abstract: The Segment Anything Model (SAM) has advanced interactive segmentation but is limited by the high computational cost on high-resolution images. This requires downsampling to meet GPU constraints, sacrificing the fine-grained details needed for high-precision interactive segmentation. To address SAM's limitations, we focus on visual length extrapolation and propose a lightweight model named HRSAM. The extrapolation enables HRSAM trained on low resolutions to generalize to high resolutions. We begin by finding the link between the extrapolation and attention scores, which leads us to base HRSAM on Swin attention. We then introduce the Flexible Local Attention (FLA) framework, using CUDA-optimized Efficient Memory Attention to accelerate HRSAM. Within FLA, we implement Flash Swin attention, achieving over a 35% speedup compared to traditional Swin attention, and propose a KV-only padding mechanism to enhance extrapolation. We also develop the Cycle-scan module that uses State Space models to efficiently expand HRSAM's receptive field. We further develop the HRSAM++ within FLA by adding an anchor map, providing multi-scale data augmentation for the extrapolation and a larger receptive field at slight computational cost. Experiments show that, under standard training, HRSAMs surpass the previous SOTA with only 38% of the latency. With SAM-distillation, the extrapolation enables HRSAMs to outperform the teacher model at lower latency. Further finetuning achieves performance significantly exceeding the previous SOTA.
Related papers
- Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation [5.011091042850546]
Adapting foundation models for medical image analysis requires finetuning them on a considerable amount of data.
collecting task-specific medical data for such finetuning at a central location raises many privacy concerns.
Although Federated learning (FL) provides an effective means for training on private decentralized data, communication costs in federating large foundation models can quickly become a significant bottleneck.
arXiv Detail & Related papers (2024-07-31T16:48:06Z) - Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)
AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.
As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - Momentum-SAM: Sharpness Aware Minimization without Computational
Overhead [0.6577148087211809]
We propose Momentum-SAM, which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands.
We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.
arXiv Detail & Related papers (2024-01-22T15:19:18Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.
We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model.
We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.