Exploring the Semi-supervised Video Object Segmentation Problem from a
Cyclic Perspective
- URL: http://arxiv.org/abs/2111.01323v1
- Date: Tue, 2 Nov 2021 01:50:23 GMT
- Title: Exploring the Semi-supervised Video Object Segmentation Problem from a
Cyclic Perspective
- Authors: Yuxi Li, Ning Xu, Wenjie Yang, John See, Weiyao Lin
- Abstract summary: In this paper, we place the semi-supervised video object segmentation problem into a cyclic workflow.
We show that a cyclic mechanism incorporated to the standard sequential flow can produce more consistent representations for pixel-wise correspondance.
We also develop cycle effective receptive field (cycle-ERF) based on gradient correction process to provide a new perspective into analyzing object-specific regions of interests.
- Score: 36.4057004419079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern video object segmentation (VOS) algorithms have achieved remarkably
high performance in a sequential processing order, while most of currently
prevailing pipelines still show some obvious inadequacy like accumulative
error, unknown robustness or lack of proper interpretation tools. In this
paper, we place the semi-supervised video object segmentation problem into a
cyclic workflow and find the defects above can be collectively addressed via
the inherent cyclic property of semi-supervised VOS systems. Firstly, a cyclic
mechanism incorporated to the standard sequential flow can produce more
consistent representations for pixel-wise correspondance. Relying on the
accurate reference mask in the starting frame, we show that the error
propagation problem can be mitigated. Next, a simple gradient correction
module, which naturally extends the offline cyclic pipeline to an online
manner, can highlight the high-frequent and detailed part of results to further
improve the segmentation quality while keeping feasible computation cost.
Meanwhile such correction can protect the network from severe performance
degration resulted from interference signals. Finally we develop cycle
effective receptive field (cycle-ERF) based on gradient correction process to
provide a new perspective into analyzing object-specific regions of interests.
We conduct comprehensive comparison and detailed analysis on challenging
benchmarks of DAVIS16, DAVIS17 and Youtube-VOS, demonstrating that the cyclic
mechanism is helpful to enhance segmentation quality, improve the robustness of
VOS systems, and further provide qualitative comparison and interpretation on
how different VOS algorithms work. The code of this project can be found at
https://github.com/lyxok1/STM-Training
Related papers
- Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency [9.115508086522887]
We introduce a weakly-supervised method called Eigen VIS that achieves competitive accuracy compared to other VIS approaches.
This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Co-efficient (QCC)
The code is available on https://github.com/farnooshar/EigenVIS.
arXiv Detail & Related papers (2024-08-29T16:05:05Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [24.884078497381633]
We introduce a Transformer-based approach to video object segmentation (VOS)
Our attention-based approach allows a model to learn to attend over a history features of multiple frames.
Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness compared with the state of the art.
arXiv Detail & Related papers (2021-01-21T20:06:12Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Delving into the Cyclic Mechanism in Semi-supervised Video Object
Segmentation [37.3336313567187]
A cyclic mechanism is incorporated to the standard semi-supervised process to produce more robust representations.
We introduce a simple gradient correction module, which extends the offline pipeline to an online method.
Finally, we develop cycle effective receptive field (cycle-ERF) based on gradient correction to provide a new perspective into analyzing object-specific regions of interests.
arXiv Detail & Related papers (2020-10-23T05:40:53Z) - Hybrid-S2S: Video Object Segmentation with Recurrent Networks and
Correspondence Matching [3.9053553775979086]
One-shot Video Object(VOS) is the task of tracking an object of interest within a video sequence.
We study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture named HS2S.
Our experiments show that augmenting the RNN with correspondence matching is a highly effective solution to reduce the drift problem.
arXiv Detail & Related papers (2020-10-10T19:00:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.