SAM 2: Segment Anything in Images and Videos
- URL: http://arxiv.org/abs/2408.00714v2
- Date: Mon, 28 Oct 2024 16:37:57 GMT
- Title: SAM 2: Segment Anything in Images and Videos
- Authors: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer,
- Abstract summary: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.
We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date.
Our model is a simple transformer architecture with streaming memory for real-time video processing.
- Score: 63.44869623822368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing our main model, dataset, as well as code for model training and our demo.
Related papers
- Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track [28.52754012142431]
Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos.
SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date.
Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track.
arXiv Detail & Related papers (2024-08-19T16:13:14Z) - From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model [0.5639904484784127]
The Segment Anything Model (SAM) was introduced to the computer vision community by Meta in April 2023.
SAM excels in zero-shot performance, segmenting unseen objects without additional training, stimulated by a large dataset of over one billion image masks.
SAM 2 expands this functionality to video, leveraging memory from preceding and subsequent frames to generate accurate segmentation across entire videos.
arXiv Detail & Related papers (2024-08-12T17:17:35Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - RAP-SAM: Towards Real-Time All-Purpose Segment Anything [120.17175256421622]
Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
Current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene.
This work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment.
arXiv Detail & Related papers (2024-01-18T18:59:30Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Monocular Instance Motion Segmentation for Autonomous Driving: KITTI
InstanceMotSeg Dataset and Multi-task Baseline [5.000331633798637]
Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner.
Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level.
We create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset.
arXiv Detail & Related papers (2020-08-16T21:47:09Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z) - Evolution of Image Segmentation using Deep Convolutional Neural Network:
A Survey [0.0]
We take a glance at the evolution of both semantic and instance segmentation work based on CNN.
We have given a glimpse of some state-of-the-art panoptic segmentation models.
arXiv Detail & Related papers (2020-01-13T06:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.