DFormer: Diffusion-guided Transformer for Universal Image Segmentation
- URL: http://arxiv.org/abs/2306.03437v2
- Date: Thu, 8 Jun 2023 03:22:59 GMT
- Title: DFormer: Diffusion-guided Transformer for Universal Image Segmentation
- Authors: Hefeng Wang, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Shahbaz
Khan, Yanwei Pang
- Abstract summary: The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model.
At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks.
Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
- Score: 86.73405604947459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces an approach, named DFormer, for universal image
segmentation. The proposed DFormer views universal image segmentation task as a
denoising process using a diffusion model. DFormer first adds various levels of
Gaussian noise to ground-truth masks, and then learns a model to predict
denoising masks from corrupted masks. Specifically, we take deep pixel-level
features along with the noisy masks as inputs to generate mask features and
attention masks, employing diffusion-based decoder to perform mask prediction
gradually. At inference, our DFormer directly predicts the masks and
corresponding categories from a set of randomly-generated masks. Extensive
experiments reveal the merits of our proposed contributions on different image
segmentation tasks: panoptic segmentation, instance segmentation, and semantic
segmentation. Our DFormer outperforms the recent diffusion-based panoptic
segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val2017 set.
Further, DFormer achieves promising semantic segmentation performance
outperforming the recent diffusion-based method by 2.2% on ADE20K val set. Our
source code and models will be publicly on https://github.com/cp3wan/DFormer
Related papers
- SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete
Diffusion Process [102.18226145874007]
We propose a model-agnostic solution called SegRefiner to enhance the quality of object masks produced by different segmentation models.
SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process.
It consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks.
arXiv Detail & Related papers (2023-12-19T18:53:47Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Multimodal Diffusion Segmentation Model for Object Segmentation from
Manipulation Instructions [0.0]
We develop a model that comprehends a natural language instruction and generates a segmentation mask for the target everyday object.
We build a new dataset based on the well-known Matterport3D and REVERIE datasets.
The performance of MDSM surpassed that of the baseline method by a large margin of +10.13 mean IoU.
arXiv Detail & Related papers (2023-07-17T16:07:07Z) - Generative Semantic Segmentation [40.57488730457299]
We present a generative learning approach for semantic segmentation.
Uniquely, we cast semantic segmentation as an image-conditioned mask generation problem.
Experiments show that our GSS can perform competitively to prior art alternatives in the standard semantic segmentation setting.
arXiv Detail & Related papers (2023-03-20T17:55:37Z) - DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance.
The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model
for Few-Shot Instance Segmentation [31.648523213206595]
Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task.
Conventional approaches have attempted to address the task via prototype learning, known as point estimation.
We propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask.
arXiv Detail & Related papers (2023-03-09T08:24:02Z) - Mask Transfiner for High-Quality Instance Segmentation [95.74244714914052]
We present Mask Transfiner for high-quality and efficient instance segmentation.
Our approach only processes detected error-prone tree nodes and self-corrects their errors in parallel.
Our code and trained models will be available at http://vis.xyz/pub/transfiner.
arXiv Detail & Related papers (2021-11-26T18:58:22Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - DCT-Mask: Discrete Cosine Transform Mask Representation for Instance
Segmentation [50.70679435176346]
We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector.
Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
arXiv Detail & Related papers (2020-11-19T15:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.