The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos
- URL: http://arxiv.org/abs/2111.06394v1
- Date: Thu, 11 Nov 2021 18:59:11 GMT
- Title: The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos
- Authors: Runtao Liu, Zhirong Wu, Stella X. Yu, Stephen Lin
- Abstract summary: We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
- Score: 59.12750806239545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can easily segment moving objects without knowing what they are. That
objectness could emerge from continuous visual observations motivates us to
model grouping and movement concurrently from unlabeled videos. Our premise is
that a video has different views of the same scene related by moving
components, and the right region segmentation and region flow would allow
mutual view synthesis which can be checked from the data itself without any
external supervision. Our model starts with two separate pathways: an
appearance pathway that outputs feature-based region segmentation for a single
image, and a motion pathway that outputs motion features for a pair of images.
It then binds them in a conjoint representation called segment flow that pools
flow offsets over each region and provides a gross characterization of moving
regions for the entire scene. By training the model to minimize view synthesis
errors based on segment flow, our appearance and motion pathways learn region
segmentation and flow estimation automatically without building them up from
low-level edges or optical flows respectively. Our model demonstrates the
surprising emergence of objectness in the appearance pathway, surpassing prior
works on zero-shot object segmentation from an image, moving object
segmentation from a video with unsupervised test-time adaptation, and semantic
image segmentation by supervised fine-tuning. Our work is the first truly
end-to-end zero-shot object segmentation from videos. It not only develops
generic objectness for segmentation and tracking, but also outperforms
prevalent image-based contrastive learning methods without augmentation
engineering.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models [28.304047711166056]
Large-scale pre-trained models have shown promising advances in detecting and segmenting objects in 2D static images in the wild.
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos.
arXiv Detail & Related papers (2023-10-10T20:25:30Z) - Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual
Grouping [52.03068246508119]
We study learning object segmentation from unlabeled videos.
We learn an image segmenter first in the loop of approximating optical flow with constant segment flow plus small within-segment residual flow.
Our model surpasses the state-of-the-art by absolute gains of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively.
arXiv Detail & Related papers (2023-04-17T07:18:21Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.