Learning To Segment Dominant Object Motion From Watching Videos
- URL: http://arxiv.org/abs/2111.14160v1
- Date: Sun, 28 Nov 2021 14:51:00 GMT
- Title: Learning To Segment Dominant Object Motion From Watching Videos
- Authors: Sahir Shrestha, Mohammad Ali Armin, Hongdong Li, Nick Barnes
- Abstract summary: We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
- Score: 72.57852930273256
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing deep learning based unsupervised video object segmentation methods
still rely on ground-truth segmentation masks to train. Unsupervised in this
context only means that no annotated frames are used during inference. As
obtaining ground-truth segmentation masks for real image scenes is a laborious
task, we envision a simple framework for dominant moving object segmentation
that neither requires annotated data to train nor relies on saliency priors or
pre-trained optical flow maps. Inspired by a layered image representation, we
introduce a technique to group pixel regions according to their affine
parametric motion. This enables our network to learn segmentation of the
dominant foreground object using only RGB image pairs as input for both
training and inference. We establish a baseline for this novel task using a new
MovingCars dataset and show competitive performance against recent methods that
require annotated masks to train.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - CYBORGS: Contrastively Bootstrapping Object Representations by Grounding
in Segmentation [22.89327564484357]
We propose a framework which accomplishes this goal via joint learning of representations and segmentation.
By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining.
arXiv Detail & Related papers (2022-03-17T14:20:05Z) - GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation [16.900404701997502]
We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
arXiv Detail & Related papers (2021-12-02T07:57:56Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Footprints and Free Space from a Single Color Image [32.57664001590537]
We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input.
We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data.
We find that a surprisingly low bar for spatial coverage of training scenes is required.
arXiv Detail & Related papers (2020-04-14T09:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.