SipMask: Spatial Information Preservation for Fast Image and Video
Instance Segmentation
- URL: http://arxiv.org/abs/2007.14772v1
- Date: Wed, 29 Jul 2020 12:21:00 GMT
- Title: SipMask: Spatial Information Preservation for Fast Image and Video
Instance Segmentation
- Authors: Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan,
Yanwei Pang, Ling Shao
- Abstract summary: We propose a fast single-stage instance segmentation method called SipMask.
It preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box.
In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings.
- Score: 149.242230059447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-stage instance segmentation approaches have recently gained popularity
due to their speed and simplicity, but are still lagging behind in accuracy,
compared to two-stage methods. We propose a fast single-stage instance
segmentation method, called SipMask, that preserves instance-specific spatial
information by separating mask prediction of an instance to different
sub-regions of a detected bounding-box. Our main contribution is a novel
light-weight spatial preservation (SP) module that generates a separate set of
spatial coefficients for each sub-region within a bounding-box, leading to
improved mask predictions. It also enables accurate delineation of spatially
adjacent instances. Further, we introduce a mask alignment weighting loss and a
feature alignment scheme to better correlate mask prediction with object
detection. On COCO test-dev, our SipMask outperforms the existing single-stage
methods. Compared to the state-of-the-art single-stage TensorMask, SipMask
obtains an absolute gain of 1.0% (mask AP), while providing a four-fold
speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an
absolute gain of 3.0% (mask AP) under similar settings, while operating at
comparable speed on a Titan Xp. We also evaluate our SipMask for real-time
video instance segmentation, achieving promising results on YouTube-VIS
dataset. The source code is available at
https://github.com/JialeCao001/SipMask.
Related papers
- Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Real-time Instance Segmentation with Discriminative Orientation Maps [0.16311150636417257]
We propose a real-time instance segmentation framework termed OrienMask.
A mask head is added to predict some discriminative orientation maps.
All instances that match with the same anchor size share a common orientation map.
arXiv Detail & Related papers (2021-06-23T07:27:35Z) - Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst)
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector.
We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z) - SOLOv2: Dynamic and Fast Instance Segmentation [102.15325936477362]
We build a simple, direct, and fast instance segmentation framework with strong performance.
We take one step further by dynamically learning the mask head of the object segmenter.
We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy.
arXiv Detail & Related papers (2020-03-23T09:44:21Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.