Related papers: DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

URL: http://arxiv.org/abs/2011.09876v3
Date: Tue, 27 Apr 2021 13:53:02 GMT
Title: DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Authors: Xing Shen, Jirui Yang, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua, Xiaoliang Cheng, Kewei Liang
Abstract summary: We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
Score: 50.70679435176346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a $28\times 28$ binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity. Code is available at https://github.com/aliyun/DCT-Mask.git.

Related papers

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation [109.19165503929992]
We present MaskCLIP++, which uses ground-truth masks instead of generated masks to enhance the mask classification capability of CLIP. After low-cost fine-tuning, MaskCLIP++ significantly improves the mask classification performance on multi-domain datasets. We achieve performance improvements of +1.7, +2.3, +2.1, +3.1, and +0.3 mIoU on the A-847, PC-459, A-150, PC-59, and PAS-20 datasets.
arXiv Detail & Related papers (2024-12-16T05:44:45Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
DFormer: Diffusion-guided Transformer for Universal Image Segmentation [86.73405604947459]
The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
arXiv Detail & Related papers (2023-06-06T06:33:32Z)
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer [158.06850125920923]
diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image. We propose a mask latent modeling scheme to explicitly enhance the DPMs' ability to contextual relation learning among object semantic parts in an image. Experimental results show that MDTv2 achieves superior image synthesis performance, e.g., a new SOTA FID score of 1.58 on the ImageNet dataset, and has more than 10x faster learning speed than the previous SOTA DiT.
arXiv Detail & Related papers (2023-03-25T07:47:21Z)
DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance. The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z)
MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers. We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z)
Mask Transfiner for High-Quality Instance Segmentation [95.74244714914052]
We present Mask Transfiner for high-quality and efficient instance segmentation. Our approach only processes detected error-prone tree nodes and self-corrects their errors in parallel. Our code and trained models will be available at http://vis.xyz/pub/transfiner.
arXiv Detail & Related papers (2021-11-26T18:58:22Z)
Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection [11.390163890611246]
Mask R-CNN is widely adopted as a strong baseline for arbitrary-shaped scene text detection and spotting. There may exist multiple instances in one proposal, which makes it difficult for the mask head to distinguish different instances and degrades the performance. We propose instance-aware mask learning in which the mask head learns to predict the shape of the whole instance rather than classify each pixel to text or non-text.
arXiv Detail & Related papers (2021-09-08T04:32:29Z)
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.