Self-supervised Amodal Video Object Segmentation
- URL: http://arxiv.org/abs/2210.12733v1
- Date: Sun, 23 Oct 2022 14:09:35 GMT
- Title: Self-supervised Amodal Video Object Segmentation
- Authors: Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco
Locatello, David Wipf, Yanwei Fu, Zheng Zhang
- Abstract summary: Amodal perception requires inferring the full shape of an object that is partially occluded.
This paper develops a new framework of amodal Video object segmentation (SaVos)
- Score: 57.929357732733926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal perception requires inferring the full shape of an object that is
partially occluded. This task is particularly challenging on two levels: (1) it
requires more information than what is contained in the instant retina or
imaging sensor, (2) it is difficult to obtain enough well-annotated amodal
labels for supervision. To this end, this paper develops a new framework of
Self-supervised amodal Video object segmentation (SaVos). Our method
efficiently leverages the visual information of video temporal sequences to
infer the amodal mask of objects. The key intuition is that the occluded part
of an object can be explained away if that part is visible in other frames,
possibly deformed as long as the deformation can be reasonably learned.
Accordingly, we derive a novel self-supervised learning paradigm that
efficiently utilizes the visible object parts as the supervision to guide the
training on videos. In addition to learning type prior to complete masks for
known types, SaVos also learns the spatiotemporal prior, which is also useful
for the amodal task and could generalize to unseen types. The proposed
framework achieves the state-of-the-art performance on the synthetic amodal
segmentation benchmark FISHBOWL and the real world benchmark KINS-Video-Car.
Further, it lends itself well to being transferred to novel distributions using
test-time adaptation, outperforming existing models even after the transfer to
a new distribution.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Rethinking Amodal Video Segmentation from Learning Supervised Signals
with Object-centric Representation [47.39455910191075]
Video amodal segmentation is a challenging task in computer vision.
Recent studies have achieved promising performance by using motion flow to integrate information across frames under a self-supervised setting.
This paper presents a rethinking to previous works. We particularly leverage the supervised signals with object-centric representation.
arXiv Detail & Related papers (2023-09-23T04:12:02Z) - LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and
Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks.
We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z) - Masked Motion Encoding for Self-Supervised Video Representation Learning [84.24773072241945]
We present Masked Motion MME, a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues.
Motivated by the fact that human is able to recognize an action by tracking objects' position changes and shape changes, we propose to reconstruct a motion trajectory that represents these two kinds of change in the masked regions.
Pre-trained with our MME paradigm, the model is able to anticipate long-term and fine-grained motion details.
arXiv Detail & Related papers (2022-10-12T11:19:55Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Joint Inductive and Transductive Learning for Video Object Segmentation [107.32760625159301]
Semi-supervised object segmentation is a task of segmenting the target object in a video sequence given only a mask in the first frame.
Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning.
We propose to integrate transductive and inductive learning into a unified framework to exploit complement between them for accurate and robust video object segmentation.
arXiv Detail & Related papers (2021-08-08T16:25:48Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.