Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual
Grouping
- URL: http://arxiv.org/abs/2304.08025v1
- Date: Mon, 17 Apr 2023 07:18:21 GMT
- Title: Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual
Grouping
- Authors: Long Lian, Zhirong Wu, Stella X. Yu
- Abstract summary: We study learning object segmentation from unlabeled videos.
We learn an image segmenter first in the loop of approximating optical flow with constant segment flow plus small within-segment residual flow.
Our model surpasses the state-of-the-art by absolute gains of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively.
- Score: 52.03068246508119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study learning object segmentation from unlabeled videos. Humans can
easily segment moving objects without knowing what they are. The Gestalt law of
common fate, i.e., what move at the same speed belong together, has inspired
unsupervised object discovery based on motion segmentation. However, common
fate is not a reliable indicator of objectness: Parts of an articulated /
deformable object may not move at the same speed, whereas shadows / reflections
of an object always move with it but are not part of it.
Our insight is to bootstrap objectness by first learning image features from
relaxed common fate and then refining them based on visual appearance grouping
within the image itself and across images statistically. Specifically, we learn
an image segmenter first in the loop of approximating optical flow with
constant segment flow plus small within-segment residual flow, and then by
refining it for more coherent appearance and statistical figure-ground
relevance.
On unsupervised video object segmentation, using only ResNet and
convolutional heads, our model surpasses the state-of-the-art by absolute gains
of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively, demonstrating the
effectiveness of our ideas. Our code is publicly available.
Related papers
- LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and
Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks.
We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - Unsupervised Multi-object Segmentation by Predicting Probable Motion
Patterns [92.80981308407098]
We propose a new approach to learn to segment multiple image objects without manual supervision.
The method can extract objects form still images, but uses videos for supervision.
We show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks.
arXiv Detail & Related papers (2022-10-21T17:57:05Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - NudgeSeg: Zero-Shot Object Segmentation by Repeated Physical Interaction [8.712677353734627]
We present the first framework to segment unknown objects in a cluttered scene by repeatedly 'nudging' at the objects.
We show an impressive average detection rate of over 86% on zero-shot objects.
arXiv Detail & Related papers (2021-09-22T05:17:09Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.