Improving Semantic Segmentation in Transformers using Hierarchical
Inter-Level Attention
- URL: http://arxiv.org/abs/2207.02126v1
- Date: Tue, 5 Jul 2022 15:47:31 GMT
- Title: Improving Semantic Segmentation in Transformers using Hierarchical
Inter-Level Attention
- Authors: Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler
- Abstract summary: Hierarchical Inter-Level Attention (HILA) is an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels.
HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder.
We show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS.
- Score: 68.7861229363712
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing transformer-based image backbones typically propagate feature
information in one direction from lower to higher-levels. This may not be ideal
since the localization ability to delineate accurate object boundaries, is most
prominent in the lower, high-resolution feature maps, while the semantics that
can disambiguate image signals belonging to one object vs. another, typically
emerges in a higher level of processing. We present Hierarchical Inter-Level
Attention (HILA), an attention-based method that captures Bottom-Up and
Top-Down Updates between features of different levels. HILA extends
hierarchical vision transformer architectures by adding local connections
between features of higher and lower levels to the backbone encoder. In each
iteration, we construct a hierarchy by having higher-level features compete for
assignments to update lower-level features belonging to them, iteratively
resolving object-part relationships. These improved lower-level features are
then used to re-update the higher-level features. HILA can be integrated into
the majority of hierarchical architectures without requiring any changes to the
base model. We add HILA into SegFormer and the Swin Transformer and show
notable improvements in accuracy in semantic segmentation with fewer parameters
and FLOPS. Project website and code:
https://www.cs.toronto.edu/~garyleung/hila/
Related papers
- Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Pyramid Hierarchical Transformer for Hyperspectral Image Classification [1.9427851979929982]
We propose a pyramid-based hierarchical transformer (PyFormer)
This innovative approach organizes input data hierarchically into segments, each representing distinct abstraction levels.
Results underscore the superiority of the proposed method over traditional approaches.
arXiv Detail & Related papers (2024-04-23T11:41:19Z) - Skipped Feature Pyramid Network with Grid Anchor for Object Detection [6.99246486061412]
We propose a skipped connection to obtain stronger semantics at each level of the feature pyramid.
In our method, the lower-level feature only connects with the feature at the highest level, making it more reasonable that each level is responsible for detecting objects with fixed scales.
arXiv Detail & Related papers (2023-10-22T23:27:05Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.