FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and
Temporally Consistent Single-Shot Video Object Segmentation
- URL: http://arxiv.org/abs/2111.10621v1
- Date: Sat, 20 Nov 2021 16:17:10 GMT
- Title: FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and
Temporally Consistent Single-Shot Video Object Segmentation
- Authors: Julia Gong, F. Christopher Holsinger, Serena Yeung
- Abstract summary: We introduce a new foreground-targeted visual warping approach that learns flow fields from VOS data.
We train a flow module to capture detailed motion between frames using two weakly-supervised losses.
Our approach produces segmentations with high detail and temporal consistency.
- Score: 4.3171602814387136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the task of semi-supervised video object segmentation (VOS). Our
approach mitigates shortcomings in previous VOS work by addressing detail
preservation and temporal consistency using visual warping. In contrast to
prior work that uses full optical flow, we introduce a new foreground-targeted
visual warping approach that learns flow fields from VOS data. We train a flow
module to capture detailed motion between frames using two weakly-supervised
losses. Our object-focused approach of warping previous foreground object masks
to their positions in the target frame enables detailed mask refinement with
fast runtimes without using extra flow supervision. It can also be integrated
directly into state-of-the-art segmentation networks. On the DAVIS17 and
YouTubeVOS benchmarks, we outperform state-of-the-art offline methods that do
not use extra data, as well as many online methods that use extra data.
Qualitatively, we also show our approach produces segmentations with high
detail and temporal consistency.
Related papers
- Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation.
We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z) - Video Mask Transfiner for High-Quality Video Instance Segmentation [102.50936366583106]
Video Mask Transfiner (VMT) is capable of leveraging fine-grained high-resolution features thanks to a highly efficient video transformer structure.
Based on our VMT architecture, we design an automated annotation refinement approach by iterative training and self-correction.
We compare VMT with the most recent state-of-the-art methods on the HQ-YTVIS, as well as the Youtube-VIS, OVIS and BDD100K MOTS.
arXiv Detail & Related papers (2022-07-28T11:13:37Z) - Weakly Supervised Video Salient Object Detection via Point Supervision [18.952253968878356]
We propose a strong baseline model based on point supervision.
To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives.
We label two point-supervised datasets, P-DAVIS and P-DAVSOD, by relabeling the DAVIS and the DAVSOD dataset.
arXiv Detail & Related papers (2022-07-15T03:31:15Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Revisiting Sequence-to-Sequence Video Object Segmentation with
Multi-Task Loss and Skip-Memory [4.343892430915579]
Video Object (VOS) is an active research area of the visual domain.
Current approaches lose objects in longer sequences, especially when the object is small or briefly occluded.
We build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data.
arXiv Detail & Related papers (2020-04-25T15:38:09Z) - Dual Temporal Memory Network for Efficient Video Object Segmentation [42.05305410986511]
One of the fundamental challenges in Video Object (VOS) is how to make the most use of the temporal information to boost the performance.
We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories.
Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network.
arXiv Detail & Related papers (2020-03-13T06:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.