Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion
- URL: http://arxiv.org/abs/2205.07844v1
- Date: Mon, 16 May 2022 17:55:34 GMT
- Title: Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion
- Authors: Subhabrata Choudhury, Laurynas Karazija, Iro Laina, Andrea Vedaldi,
Christian Rupprecht
- Abstract summary: We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
- Score: 92.80981308407098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motion, measured via optical flow, provides a powerful cue to discover and
learn objects in images and videos. However, compared to using appearance, it
has some blind spots, such as the fact that objects become invisible if they do
not move. In this work, we propose an approach that combines the strengths of
motion-based and appearance-based segmentation. We propose to supervise an
image segmentation network, tasking it with predicting regions that are likely
to contain simple motion patterns, and thus likely to correspond to objects. We
apply this network in two modes. In the unsupervised video segmentation mode,
the network is trained on a collection of unlabelled videos, using the learning
process itself as an algorithm to segment these videos. In the unsupervised
image segmentation model, the network is learned using videos and applied to
segment independent still images. With this, we obtain strong empirical results
in unsupervised video and image segmentation, significantly outperforming the
state of the art on benchmarks such as DAVIS, sometimes with a $5\%$ IoU gap.
Related papers
- Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual
Grouping [52.03068246508119]
We study learning object segmentation from unlabeled videos.
We learn an image segmenter first in the loop of approximating optical flow with constant segment flow plus small within-segment residual flow.
Our model surpasses the state-of-the-art by absolute gains of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively.
arXiv Detail & Related papers (2023-04-17T07:18:21Z) - Unsupervised Multi-object Segmentation by Predicting Probable Motion
Patterns [92.80981308407098]
We propose a new approach to learn to segment multiple image objects without manual supervision.
The method can extract objects form still images, but uses videos for supervision.
We show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks.
arXiv Detail & Related papers (2022-10-21T17:57:05Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.