Semantic Feature Integration network for Fine-grained Visual
Classification
- URL: http://arxiv.org/abs/2302.10275v1
- Date: Mon, 13 Feb 2023 07:32:25 GMT
- Title: Semantic Feature Integration network for Fine-grained Visual
Classification
- Authors: Hui Wang, Yueyang li, Haichi Luo
- Abstract summary: We propose the Semantic Feature Integration network (SFI-Net) to address the above difficulties.
By eliminating unnecessary features and reconstructing the semantic relations among discriminative features, our SFI-Net has achieved satisfying performance.
- Score: 5.182627302449368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-Grained Visual Classification (FGVC) is known as a challenging task due
to subtle differences among subordinate categories. Many current FGVC
approaches focus on identifying and locating discriminative regions by using
the attention mechanism, but neglect the presence of unnecessary features that
hinder the understanding of object structure. These unnecessary features,
including 1) ambiguous parts resulting from the visual similarity in object
appearances and 2) noninformative parts (e.g., background noise), can have a
significant adverse impact on classification results. In this paper, we propose
the Semantic Feature Integration network (SFI-Net) to address the above
difficulties. By eliminating unnecessary features and reconstructing the
semantic relations among discriminative features, our SFI-Net has achieved
satisfying performance. The network consists of two modules: 1) the multi-level
feature filter (MFF) module is proposed to remove unnecessary features with
different receptive field, and then concatenate the preserved features on pixel
level for subsequent disposal; 2) the semantic information reconstitution (SIR)
module is presented to further establish semantic relations among
discriminative features obtained from the MFF module. These two modules are
carefully designed to be light-weighted and can be trained end-to-end in a
weakly-supervised way. Extensive experiments on four challenging fine-grained
benchmarks demonstrate that our proposed SFI-Net achieves the state-of-the-arts
performance. Especially, the classification accuracy of our model on
CUB-200-2011 and Stanford Dogs reaches 92.64% and 93.03%, respectively.
Related papers
- Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization [30.92656780805478]
We propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for fine-grained visual categorization (FGVC)
To model the spatial contextual relationship between rich part descriptors and global semantics, we develop a novel multi-part and multi-scale cross-attention (MPMSCA) module.
We also propose a generic multi-level semantic quality evaluation module (MLSQE) to progressively supervise and enhance hierarchical semantics from different levels of the backbone network.
arXiv Detail & Related papers (2024-03-15T13:40:44Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo
Matching Networks [3.7384509727711923]
We introduce a pairwise feature for deep stereo matching networks, named LSP (Local Similarity Pattern)
Through explicitly revealing the neighbor relationships, LSP contains rich structural information, which can be leveraged to aid for more discriminative feature description.
Secondly, we design a dynamic self-reassembling refinement strategy and apply it to the cost distribution and the disparity map respectively.
arXiv Detail & Related papers (2021-12-02T06:52:54Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z) - Attention-guided Context Feature Pyramid Network for Object Detection [10.30536638944019]
We build a novel architecture, called Attention-guided Context Feature Pyramid Network (AC-FPN)
AC-FPN exploits discriminative information from various large receptive fields via integrating attention-guided multi-path features.
Our AC-FPN can be readily plugged into existing FPN-based models.
arXiv Detail & Related papers (2020-05-23T05:24:50Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z) - AlignSeg: Feature-Aligned Segmentation Networks [109.94809725745499]
We propose Feature-Aligned Networks (AlignSeg) to address misalignment issues during the feature aggregation process.
Our network achieves new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively.
arXiv Detail & Related papers (2020-02-24T10:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.