DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping
- URL: http://arxiv.org/abs/2008.07012v2
- Date: Sat, 3 Apr 2021 06:25:46 GMT
- Title: DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping
- Authors: Yanchao Yang, Brian Lai and Stefano Soatto
- Abstract summary: We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
- Score: 72.84991726271024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe an unsupervised method to detect and segment portions of images
of live scenes that, at some point in time, are seen moving as a coherent
whole, which we refer to as objects. Our method first partitions the motion
field by minimizing the mutual information between segments. Then, it uses the
segments to learn object models that can be used for detection in a static
image. Static and dynamic models are represented by deep neural networks
trained jointly in a bootstrapping strategy, which enables extrapolation to
previously unseen objects. While the training process requires motion, the
resulting object segmentation network can be used on either static images or
videos at inference time. As the volume of seen videos grows, more and more
objects are seen moving, priming their detection, which then serves as a
regularizer for new objects, turning our method into unsupervised continual
learning to segment objects. Our models are compared to the state of the art in
both video object segmentation and salient object detection. In the six
benchmark datasets tested, our models compare favorably even to those using
pixel-level supervision, despite requiring no manual annotation.
Related papers
- Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models [28.304047711166056]
Large-scale pre-trained models have shown promising advances in detecting and segmenting objects in 2D static images in the wild.
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos.
arXiv Detail & Related papers (2023-10-10T20:25:30Z) - Unsupervised Multi-object Segmentation by Predicting Probable Motion
Patterns [92.80981308407098]
We propose a new approach to learn to segment multiple image objects without manual supervision.
The method can extract objects form still images, but uses videos for supervision.
We show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks.
arXiv Detail & Related papers (2022-10-21T17:57:05Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - NudgeSeg: Zero-Shot Object Segmentation by Repeated Physical Interaction [8.712677353734627]
We present the first framework to segment unknown objects in a cluttered scene by repeatedly 'nudging' at the objects.
We show an impressive average detection rate of over 86% on zero-shot objects.
arXiv Detail & Related papers (2021-09-22T05:17:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.