Contextual Guided Segmentation Framework for Semi-supervised Video
Instance Segmentation
- URL: http://arxiv.org/abs/2106.03330v1
- Date: Mon, 7 Jun 2021 04:16:50 GMT
- Title: Contextual Guided Segmentation Framework for Semi-supervised Video
Instance Segmentation
- Authors: Trung-Nghia Le and Tam V. Nguyen and Minh-Triet Tran
- Abstract summary: We propose Contextual Guided (CGS) framework for video instance segmentation in three passes.
In the first pass, i.e., preview segmentation, we propose Instance Re-Identification Flow to estimate main properties of each instance.
In the second pass, i.e., contextual segmentation, we introduce multiple contextual segmentation schemes.
Experiments conducted on the DAVIS Test-Challenge dataset demonstrate the effectiveness of our proposed framework.
- Score: 20.174393465900156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose Contextual Guided Segmentation (CGS) framework for
video instance segmentation in three passes. In the first pass, i.e., preview
segmentation, we propose Instance Re-Identification Flow to estimate main
properties of each instance (i.e., human/non-human, rigid/deformable,
known/unknown category) by propagating its preview mask to other frames. In the
second pass, i.e., contextual segmentation, we introduce multiple contextual
segmentation schemes. For human instance, we develop skeleton-guided
segmentation in a frame along with object flow to correct and refine the result
across frames. For non-human instance, if the instance has a wide variation in
appearance and belongs to known categories (which can be inferred from the
initial mask), we adopt instance segmentation. If the non-human instance is
nearly rigid, we train FCNs on synthesized images from the first frame of a
video sequence. In the final pass, i.e., guided segmentation, we develop a
novel fined-grained segmentation method on non-rectangular regions of interest
(ROIs). The natural-shaped ROI is generated by applying guided attention from
the neighbor frames of the current one to reduce the ambiguity in the
segmentation of different overlapping instances. Forward mask propagation is
followed by backward mask propagation to further restore missing instance
fragments due to re-appeared instances, fast motion, occlusion, or heavy
deformation. Finally, instances in each frame are merged based on their depth
values, together with human and non-human object interaction and rare instance
priority. Experiments conducted on the DAVIS Test-Challenge dataset demonstrate
the effectiveness of our proposed framework. We achieved the 3rd consistently
in the DAVIS Challenges 2017-2019 with 75.4%, 72.4%, and 78.4% in terms of
global score, region similarity, and contour accuracy, respectively.
Related papers
- Consistent Video Instance Segmentation with Inter-Frame Recurrent
Attention [23.72098615213679]
Video instance segmentation aims at predicting object segmentation masks for each frame, as well as associating the instances across multiple frames.
Recent end-to-end video instance segmentation methods are capable of performing object segmentation and instance association together in a direct parallel sequence decoding/prediction framework.
We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.
arXiv Detail & Related papers (2022-06-14T17:22:55Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Human Instance Segmentation and Tracking via Data Association and
Single-stage Detector [17.46922710432633]
Human video instance segmentation plays an important role in computer understanding of human activities.
Most current VIS methods are based on Mask-RCNN framework.
We develop a new method for human video instance segmentation based on single-stage detector.
arXiv Detail & Related papers (2022-03-31T11:36:09Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - Video Instance Segmentation with a Propose-Reduce Paradigm [68.59137660342326]
Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos.
Prior methods usually obtain segmentation for a frame or clip first, and then merge the incomplete results by tracking or matching.
We propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step.
arXiv Detail & Related papers (2021-03-25T10:58:36Z) - Instance and Panoptic Segmentation Using Conditional Convolutions [96.7275593916409]
We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst.
We show that CondInst can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks.
arXiv Detail & Related papers (2021-02-05T06:57:02Z) - Unifying Instance and Panoptic Segmentation with Dynamic Rank-1
Convolutions [109.2706837177222]
DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation.
As a byproduct, DR1Mask is 10% faster and 1 point in mAP more accurate than previous state-of-the-art instance segmentation network BlendMask.
arXiv Detail & Related papers (2020-11-19T12:42:10Z) - Learning Panoptic Segmentation from Instance Contours [9.347742071428918]
Panopticpixel aims to provide an understanding of background (stuff) and instances of objects (things) at a pixel level.
It combines the separate tasks of semantic segmentation (level classification) and instance segmentation to build a single unified scene understanding task.
We present a fully convolution neural network that learns instance segmentation from semantic segmentation and instance contours.
arXiv Detail & Related papers (2020-10-16T03:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.