SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything
- URL: http://arxiv.org/abs/2408.11535v2
- Date: Thu, 22 Aug 2024 08:25:39 GMT
- Title: SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything
- Authors: Chongkai Yu, Anqi Li, Xiaochao Qu, Luoqi Liu, Ting Liu,
- Abstract summary: We propose a two-stage refinement framework that fully integrates images and prompts globally and locally.
The first-stage GlobalDiff Refiner is a lightweight early fusion network that combines the whole image and prompts.
The second-stage PatchDiff Refiner locates the object detail window according to the mask and prompts, then refines the local details of the object.
- Score: 14.937761564543239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of the Segment Anything Model (SAM) marks a significant milestone for interactive segmentation using generalist models. As a late fusion model, SAM extracts image embeddings once and merges them with prompts in later interactions. This strategy limits the models ability to extract detailed information from the prompted target zone. Current specialist models utilize the early fusion strategy that encodes the combination of images and prompts to target the prompted objects, yet repetitive complex computations on the images result in high latency. The key to these issues is efficiently synergizing the images and prompts. We propose SAM-REF, a two-stage refinement framework that fully integrates images and prompts globally and locally while maintaining the accuracy of early fusion and the efficiency of late fusion. The first-stage GlobalDiff Refiner is a lightweight early fusion network that combines the whole image and prompts, focusing on capturing detailed information for the entire object. The second-stage PatchDiff Refiner locates the object detail window according to the mask and prompts, then refines the local details of the object. Experimentally, we demonstrated the high effectiveness and efficiency of our method in tackling complex cases with multiple interactions. Our SAM-REF model outperforms the current state-of-the-art method in most metrics on segmentation quality without compromising efficiency.
Related papers
- Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation [37.74045675588487]
We introduce SAM into multimodal image segmentation for the first time.
We propose a novel framework that combines Latent Space Token Generation (LSTG) and Fusion Mask Prompting (FMP) modules to enhance SAM's multimodal fusion and segmentation capabilities.
arXiv Detail & Related papers (2024-08-26T02:20:55Z) - FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models.
We propose FocSAM with a pipeline redesigned on two pivotal aspects.
First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.
Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven
Image Fusion Network [2.7387720378113554]
We introduce a target and semantic awareness-driven fusion network called TSJNet.
It comprises fusion, detection, and segmentationworks arranged in a series structure.
It can generate visually pleasing fused results, achieving an average increase of 2.84% and 7.47% in object detection and segmentation mAP @0.5 and mIoU, respectively.
arXiv Detail & Related papers (2024-02-02T08:37:38Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.