MSN: Efficient Online Mask Selection Network for Video Instance
Segmentation
- URL: http://arxiv.org/abs/2106.10452v1
- Date: Sat, 19 Jun 2021 08:33:29 GMT
- Title: MSN: Efficient Online Mask Selection Network for Video Instance
Segmentation
- Authors: Vidit Goel, Jiachen Li, Shubhika Garg, Harsh Maheshwari, Humphrey Shi
- Abstract summary: We present a novel solution for Video Instance(VIS) that is automatically generating instance level segmentation masks along with object class and tracking them in a video.
Our method improves the masks from segmentation and propagation branches in an online manner using the Mask Selection Network (MSN)
Our method achieves a score of 49.1 mAP on 2021 YouTube-VIS Challenge and was ranked third place among more than 30 global teams.
- Score: 7.208483056781188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we present a novel solution for Video Instance
Segmentation(VIS), that is automatically generating instance level segmentation
masks along with object class and tracking them in a video. Our method improves
the masks from segmentation and propagation branches in an online manner using
the Mask Selection Network (MSN) hence limiting the noise accumulation during
mask tracking. We propose an effective design of MSN by using patch-based
convolutional neural network. The network is able to distinguish between very
subtle differences between the masks and choose the better masks out of the
associated masks accurately. Further, we make use of temporal consistency and
process the video sequences in both forward and reverse manner as a post
processing step to recover lost objects. The proposed method can be used to
adapt any video object segmentation method for the task of VIS. Our method
achieves a score of 49.1 mAP on 2021 YouTube-VIS Challenge and was ranked third
place among more than 30 global teams. Our code will be available at
https://github.com/SHI-Labs/Mask-Selection-Networks.
Related papers
- Siamese Masked Autoencoders [76.35448665609998]
We present Siamese Masked Autoencoders (SiamMAE) for learning visual correspondence from videos.
SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them.
It outperforms state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks.
arXiv Detail & Related papers (2023-05-23T17:59:46Z) - Mask-Free Video Instance Segmentation [102.50936366583106]
Video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets.
We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state.
Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection.
arXiv Detail & Related papers (2023-03-28T11:48:07Z) - DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance.
The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - One-Shot Video Inpainting [5.7120338754738835]
We propose a unified pipeline for one-shot video inpainting (OSVI)
By jointly learning mask prediction and video completion in an end-to-end manner, the results can be optimal for the entire task.
Our method is more reliable because the predicted masks can be used as the network's internal guidance.
arXiv Detail & Related papers (2023-02-28T07:30:36Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.