Part-guided Relational Transformers for Fine-grained Visual Recognition
- URL: http://arxiv.org/abs/2212.13685v1
- Date: Wed, 28 Dec 2022 03:45:56 GMT
- Title: Part-guided Relational Transformers for Fine-grained Visual Recognition
- Authors: Yifan Zhao, Jia Li, Xiaowu Chen, Yonghong Tian
- Abstract summary: We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
- Score: 59.20531172172135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained visual recognition is to classify objects with visually similar
appearances into subcategories, which has made great progress with the
development of deep CNNs. However, handling subtle differences between
different subcategories still remains a challenge. In this paper, we propose to
solve this issue in one unified framework from two aspects, i.e., constructing
feature-level interrelationships, and capturing part-level discriminative
features. This framework, namely PArt-guided Relational Transformers (PART), is
proposed to learn the discriminative part features with an automatic part
discovery module, and to explore the intrinsic correlations with a feature
transformation module by adapting the Transformer models from the field of
natural language processing. The part discovery module efficiently discovers
the discriminative regions which are highly-corresponded to the gradient
descent procedure. Then the second feature transformation module builds
correlations within the global embedding and multiple part embedding, enhancing
spatial interactions among semantic pixels. Moreover, our proposed approach
does not rely on additional part branches in the inference time and reaches
state-of-the-art performance on 3 widely-used fine-grained object recognition
benchmarks. Experimental results and explainable visualizations demonstrate the
effectiveness of our proposed approach. The code can be found at
https://github.com/iCVTEAM/PART.
Related papers
- Semantic Feature Integration network for Fine-grained Visual
Classification [5.182627302449368]
We propose the Semantic Feature Integration network (SFI-Net) to address the above difficulties.
By eliminating unnecessary features and reconstructing the semantic relations among discriminative features, our SFI-Net has achieved satisfying performance.
arXiv Detail & Related papers (2023-02-13T07:32:25Z) - Framework-agnostic Semantically-aware Global Reasoning for Segmentation [29.69187816377079]
We propose a component that learns to project image features into latent representations and reason between them.
Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint.
Our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks.
arXiv Detail & Related papers (2022-12-06T21:42:05Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Transforming the Interactive Segmentation for Medical Imaging [34.57242805353604]
The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance.
We propose a novel Transformer-based architecture for Interactive (TIS)
Our proposed architecture is composed of Transformer Decoder variants, which naturally fulfills feature comparison with the attention mechanisms.
arXiv Detail & Related papers (2022-08-20T03:28:23Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z) - I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage
Object Detectors [64.93963042395976]
Implicit Instance-Invariant Network (I3Net) is tailored for adapting one-stage detectors.
I3Net implicitly learns instance-invariant features via exploiting the natural characteristics of deep features in different layers.
Experiments reveal that I3Net exceeds the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2021-03-25T11:14:36Z) - Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.