PMVOS: Pixel-Level Matching-Based Video Object Segmentation
- URL: http://arxiv.org/abs/2009.08855v1
- Date: Fri, 18 Sep 2020 14:22:09 GMT
- Title: PMVOS: Pixel-Level Matching-Based Video Object Segmentation
- Authors: Suhwan Cho, Heansung Lee, Sungmin Woo, Sungjun Jang, Sangyoun Lee
- Abstract summary: Semi-supervised video object segmentation (VOS) aims to segment arbitrary target objects in video when the ground truth segmentation mask of the initial frame is provided.
Recent pixel-level matching (PM) has been widely used for feature matching because of its high performance.
We propose a novel method-PM-based video object segmentation (PMVOS)-that constructs strong template features containing the information of all past frames.
- Score: 9.357153487612965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised video object segmentation (VOS) aims to segment arbitrary
target objects in video when the ground truth segmentation mask of the initial
frame is provided. Due to this limitation of using prior knowledge about the
target object, feature matching, which compares template features representing
the target object with input features, is an essential step. Recently,
pixel-level matching (PM), which matches every pixel in template features and
input features, has been widely used for feature matching because of its high
performance. However, despite its effectiveness, the information used to build
the template features is limited to the initial and previous frames. We address
this issue by proposing a novel method-PM-based video object segmentation
(PMVOS)-that constructs strong template features containing the information of
all past frames. Furthermore, we apply self-attention to the similarity maps
generated from PM to capture global dependencies. On the DAVIS 2016 validation
set, we achieve new state-of-the-art performance among real-time methods (> 30
fps), with a J&F score of 85.6%. Performance on the DAVIS 2017 and YouTube-VOS
validation sets is also impressive, with J&F scores of 74.0% and 68.2%,
respectively.
Related papers
- Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection [12.417754433715903]
We present FAIM, a new VOD method that enhances temporal Feature Aggregation by leveraging Instance Mask features.
Using YOLOX as a base detector, FAIM achieves 87.9% mAP on the ImageNet VID dataset at 33 FPS on a single 2080Ti GPU.
arXiv Detail & Related papers (2024-12-06T10:12:10Z) - Multi-Granularity Video Object Segmentation [36.06127939037613]
We propose a large-scale, densely annotated multi-granularity video object segmentation (MUG-VOS) dataset.
We automatically collected a training set that assists in tracking both salient and non-salient objects, and we also curated a human-annotated test set for reliable evaluation.
In addition, we present memory-based mask propagation model (MMPM), trained and evaluated on MUG-VOS dataset.
arXiv Detail & Related papers (2024-12-02T13:17:41Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - Joint Modeling of Feature, Correspondence, and a Compressed Memory for
Video Object Segmentation [52.11279360934703]
Current prevailing Video Object (VOS) methods usually perform dense matching between the current and reference frames after extracting features.
We propose a unified VOS framework, coined as JointFormer, for joint modeling of the three elements of feature, correspondence, and a compressed memory.
arXiv Detail & Related papers (2023-08-25T17:30:08Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Look Before You Match: Instance Understanding Matters in Video Object
Segmentation [114.57723592870097]
In this paper, we argue that instance matters in video object segmentation (VOS)
We present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
We employ well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-auged matching is further performed.
arXiv Detail & Related papers (2022-12-13T18:59:59Z) - Region Aware Video Object Segmentation with Deep Motion Modeling [56.95836951559529]
Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
arXiv Detail & Related papers (2022-07-21T01:44:40Z) - Towards Robust Video Object Segmentation with Adaptive Object
Calibration [18.094698623128146]
Video object segmentation (VOS) aims at segmenting objects in all target frames of a video, given annotated object masks of reference frames.
We propose a new deep network, which can adaptively construct object representations and calibrate object masks to achieve stronger robustness.
Our model achieves the state-of-the-art performance among existing published works, and also exhibits superior robustness against perturbations.
arXiv Detail & Related papers (2022-07-02T17:51:29Z) - CompFeat: Comprehensive Feature Aggregation for Video Instance
Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects.
We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.