SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- URL: http://arxiv.org/abs/2101.08833v1
- Date: Thu, 21 Jan 2021 20:06:12 GMT
- Title: SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- Authors: Brendan Duke and Abdalla Ahmed and Christian Wolf and Parham Aarabi
and Graham W. Taylor
- Abstract summary: We introduce a Transformer-based approach to video object segmentation (VOS)
Our attention-based approach allows a model to learn to attend over a history features of multiple frames.
Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness compared with the state of the art.
- Score: 24.884078497381633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce a Transformer-based approach to video object
segmentation (VOS). To address compounding error and scalability issues of
prior work, we propose a scalable, end-to-end method for VOS called Sparse
Spatiotemporal Transformers (SST). SST extracts per-pixel representations for
each object in a video using sparse attention over spatiotemporal features. Our
attention-based formulation for VOS allows a model to learn to attend over a
history of multiple frames and provides suitable inductive bias for performing
correspondence-like computations necessary for solving motion segmentation. We
demonstrate the effectiveness of attention-based over recurrent networks in the
spatiotemporal domain. Our method achieves competitive results on YouTube-VOS
and DAVIS 2017 with improved scalability and robustness to occlusions compared
with the state of the art.
Related papers
- DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation [0.4487265603408873]
We present DeVOS (Deformable VOS), an architecture for Video Object that combines memory-based matching with motion-guided propagation.
Our method achieves top-rank performance on DAVIS 2017 val and test-dev (88.1%, 83.0%), YouTube-VOS 2019 val (86.6%)
arXiv Detail & Related papers (2024-05-11T14:57:22Z) - Temporally Consistent Referring Video Object Segmentation with Hybrid Memory [98.80249255577304]
We propose an end-to-end R-VOS paradigm that explicitly models temporal consistency alongside the referring segmentation.
Features of frames with automatically generated high-quality reference masks are propagated to segment remaining frames.
Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin.
arXiv Detail & Related papers (2024-03-28T13:32:49Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - Segmenting the motion components of a video: A long-term unsupervised model [5.801044612920816]
We want to provide a coherent and stable motion segmentation over the video sequence.
We propose a novel long-term optical-temporal model operating in a totally unsupervised way.
We report experiments on four VOS, demonstrating competitive quantitative results.
arXiv Detail & Related papers (2023-10-02T09:33:54Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.