A Unified Efficient Pyramid Transformer for Semantic Segmentation
- URL: http://arxiv.org/abs/2107.14209v1
- Date: Thu, 29 Jul 2021 17:47:32 GMT
- Title: A Unified Efficient Pyramid Transformer for Semantic Segmentation
- Authors: Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo Wu, Yanwei Fu, Mu Li
- Abstract summary: We advocate a unified framework(UN-EPT) to segment objects by considering both context information and boundary artifacts.
We first adapt a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling.
We demonstrate promising performance on three popular benchmarks for semantic segmentation with low memory footprint.
- Score: 40.20512714144266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation is a challenging problem due to difficulties in
modeling context in complex scenes and class confusions along boundaries. Most
literature either focuses on context modeling or boundary refinement, which is
less generalizable in open-world scenarios. In this work, we advocate a unified
framework(UN-EPT) to segment objects by considering both context information
and boundary artifacts. We first adapt a sparse sampling strategy to
incorporate the transformer-based attention mechanism for efficient context
modeling. In addition, a separate spatial branch is introduced to capture image
details for boundary refinement. The whole model can be trained in an
end-to-end manner. We demonstrate promising performance on three popular
benchmarks for semantic segmentation with low memory footprint. Code will be
released soon.
Related papers
- SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation [87.18373801829314]
In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples"
We propose SEGIC, an end-to-end segment-in-context framework built upon a single vision foundation model (VFM)
SEGIC is a straightforward yet effective approach that yields state-of-the-art performance on one-shot segmentation benchmarks.
arXiv Detail & Related papers (2023-11-24T18:59:42Z) - GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding [101.32590239809113]
Generalized Perception NeRF (GP-NeRF) is a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework.
We propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field.
arXiv Detail & Related papers (2023-11-20T15:59:41Z) - Attention-based fusion of semantic boundary and non-boundary information
to improve semantic segmentation [9.518010235273783]
This paper introduces a method for image semantic segmentation grounded on a novel fusion scheme.
The main goal of our proposal is to explore object boundary information to improve the overall segmentation performance.
Our proposed model achieved the best mIoU on the CityScapes, CamVid, and Pascal Context data sets, and the second best on Mapillary Vistas.
arXiv Detail & Related papers (2021-08-05T20:46:53Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Dynamic Dual Sampling Module for Fine-Grained Semantic Segmentation [27.624291416260185]
We propose a Dynamic Dual Sampling Module (DDSM) to conduct dynamic affinity modeling and propagate semantic context to local details.
Experiment results on both City and Camvid datasets validate the effectiveness and efficiency of the proposed approach.
arXiv Detail & Related papers (2021-05-25T04:25:47Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.