A Transductive Approach for Video Object Segmentation
- URL: http://arxiv.org/abs/2004.07193v2
- Date: Thu, 16 Apr 2020 16:15:04 GMT
- Title: A Transductive Approach for Video Object Segmentation
- Authors: Yizhuo Zhang, Zhirong Wu, Houwen Peng, and Stephen Lin
- Abstract summary: Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame.
Most of current prevailing methods utilize information from additional modules trained in other domains like optical flow and instance segmentation.
We propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed.
- Score: 55.83842083823267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised video object segmentation aims to separate a target object
from a video sequence, given the mask in the first frame. Most of current
prevailing methods utilize information from additional modules trained in other
domains like optical flow and instance segmentation, and as a result they do
not compete with other methods on common ground. To address this issue, we
propose a simple yet strong transductive method, in which additional modules,
datasets, and dedicated architectural designs are not needed. Our method takes
a label propagation approach where pixel labels are passed forward based on
feature similarity in an embedding space. Different from other propagation
methods, ours diffuses temporal information in a holistic manner which take
accounts of long-term object appearance. In addition, our method requires few
additional computational overhead, and runs at a fast $\sim$37 fps speed. Our
single model with a vanilla ResNet50 backbone achieves an overall score of 72.3
on the DAVIS 2017 validation set and 63.1 on the test set. This simple yet high
performing and efficient method can serve as a solid baseline that facilitates
future research. Code and models are available at
\url{https://github.com/microsoft/transductive-vos.pytorch}.
Related papers
- SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding.
This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z) - Matching Anything by Segmenting Anything [109.2507425045143]
We propose MASA, a novel method for robust instance association learning.
MASA learns instance-level correspondence through exhaustive data transformations.
We show that MASA achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences.
arXiv Detail & Related papers (2024-06-06T16:20:07Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation
with Minimal Annotation Costs [2.7920304852537536]
We present a new approach titled Unified Perception that achieves state-of-the-art performance without requiring video-based training.
Our method employs a simple two-stage cascaded tracking algorithm that (re)uses object embeddings computed in an image-based network.
arXiv Detail & Related papers (2023-03-03T15:00:12Z) - Unsupervised Video Object Segmentation via Prototype Memory Network [5.612292166628669]
Unsupervised video object segmentation aims to segment a target object in the video without a ground truth mask in the initial frame.
This challenge requires extracting features for the most salient common objects within a video sequence.
We propose a novel prototype memory network architecture to solve this problem.
arXiv Detail & Related papers (2022-09-08T11:08:58Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.