Related papers: Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

URL: http://arxiv.org/abs/2207.02126v1
Date: Tue, 5 Jul 2022 15:47:31 GMT
Title: Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Authors: Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler
Abstract summary: Hierarchical Inter-Level Attention (HILA) is an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels. HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder. We show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS.
Score: 68.7861229363712
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing transformer-based image backbones typically propagate feature information in one direction from lower to higher-levels. This may not be ideal since the localization ability to delineate accurate object boundaries, is most prominent in the lower, high-resolution feature maps, while the semantics that can disambiguate image signals belonging to one object vs. another, typically emerges in a higher level of processing. We present Hierarchical Inter-Level Attention (HILA), an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels. HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder. In each iteration, we construct a hierarchy by having higher-level features compete for assignments to update lower-level features belonging to them, iteratively resolving object-part relationships. These improved lower-level features are then used to re-update the higher-level features. HILA can be integrated into the majority of hierarchical architectures without requiring any changes to the base model. We add HILA into SegFormer and the Swin Transformer and show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS. Project website and code: https://www.cs.toronto.edu/~garyleung/hila/

Related papers

Learning and Evaluating Hierarchical Feature Representations [3.770103075126785]
We propose a novel framework, Hierarchical Composition of Orthogonal Subspaces (Hier-COS) Hier-COS learns to map deep feature embeddings into a vector space that is, by design, consistent with the structure of a given taxonomy tree. We demonstrate that Hier-COS achieves state-of-the-art hierarchical performance across all the datasets while simultaneously beating top-1 accuracy in all but one case.
arXiv Detail & Related papers (2025-03-10T20:59:41Z)
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models. SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z)
Pyramid Hierarchical Transformer for Hyperspectral Image Classification [1.9427851979929982]
We propose a pyramid-based hierarchical transformer (PyFormer) This innovative approach organizes input data hierarchically into segments, each representing distinct abstraction levels. Results underscore the superiority of the proposed method over traditional approaches.
arXiv Detail & Related papers (2024-04-23T11:41:19Z)
Skipped Feature Pyramid Network with Grid Anchor for Object Detection [6.99246486061412]
We propose a skipped connection to obtain stronger semantics at each level of the feature pyramid. In our method, the lower-level feature only connects with the feature at the highest level, making it more reasonable that each level is responsible for detecting objects with fixed scales.
arXiv Detail & Related papers (2023-10-22T23:27:05Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation. We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z)
Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z)
HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance. Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.