RAP-SAM: Towards Real-Time All-Purpose Segment Anything
- URL: http://arxiv.org/abs/2401.10228v1
- Date: Thu, 18 Jan 2024 18:59:30 GMT
- Title: RAP-SAM: Towards Real-Time All-Purpose Segment Anything
- Authors: Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang,
Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan
Yang
- Abstract summary: Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
Current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene.
This work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment.
- Score: 120.17175256421622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced by transformer architecture, vision foundation models (VFMs) achieve
remarkable progress in performance and generalization ability. Segment Anything
Model (SAM) is one remarkable model that can achieve generalized segmentation.
However, most VFMs cannot run in realtime, which makes it difficult to transfer
them into several products. On the other hand, current real-time segmentation
mainly has one purpose, such as semantic segmentation on the driving scene. We
argue that diverse outputs are needed for real applications. Thus, this work
explores a new real-time segmentation setting, named all-purpose segmentation
in real-time, to transfer VFMs in real-time deployment. It contains three
different tasks, including interactive segmentation, panoptic segmentation, and
video segmentation. We aim to use one model to achieve the above tasks in
real-time. We first benchmark several strong baselines. Then, we present
Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an
efficient decoupled decoder to perform prompt-driven decoding. Moreover, we
further explore different training strategies and tuning methods to boost
co-training performance further. Our code and model are available at
https://github.com/xushilin1/RAP-SAM/.
Related papers
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - SAM 2: Segment Anything in Images and Videos [63.44869623822368]
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.
We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date.
Our model is a simple transformer architecture with streaming memory for real-time video processing.
arXiv Detail & Related papers (2024-08-01T17:00:08Z) - Moving Object Segmentation: All You Need Is SAM (and Flow) [82.78026782967959]
We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects.
In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt.
These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks.
arXiv Detail & Related papers (2024-04-18T17:59:53Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.
We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model.
We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - You Only Look at Once for Real-time and Generic Multi-Task [20.61477620156465]
A-YOLOM is an adaptive, real-time, and lightweight multi-task model.
We develop an end-to-end multi-task model with a unified and streamlined segmentation structure.
We achieve competitive results on the BDD100k dataset.
arXiv Detail & Related papers (2023-10-02T21:09:43Z) - Fast Segment Anything [46.130784421779865]
Recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks.
Huge computation costs prevent it from wider applications in industry scenarios.
We propose a speed-up alternative method for this fundamental task with comparable performance.
arXiv Detail & Related papers (2023-06-21T10:08:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.