Spatial-Scale Aligned Network for Fine-Grained Recognition
- URL: http://arxiv.org/abs/2001.01211v1
- Date: Sun, 5 Jan 2020 11:12:08 GMT
- Title: Spatial-Scale Aligned Network for Fine-Grained Recognition
- Authors: Lizhao Gao, Haihua Xu, Chong Sun, Junling Liu, Yu-Wing Tai
- Abstract summary: Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations.
We propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process.
- Score: 42.71878867504503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches for fine-grained visual recognition focus on learning
marginal region-based representations while neglecting the spatial and scale
misalignments, leading to inferior performance. In this paper, we propose the
spatial-scale aligned network (SSANET) and implicitly address misalignments
during the recognition process. Especially, SSANET consists of 1) a
self-supervised proposal mining formula with Morphological Alignment
Constraints; 2) a discriminative scale mining (DSM) module, which exploits the
feature pyramid via a circulant matrix, and provides the Fourier solver for
fast scale alignments; 3) an oriented pooling (OP) module, that performs the
pooling operation in several pre-defined orientations. Each orientation defines
one kind of spatial alignment, and the network automatically determines which
is the optimal alignments through learning. With the proposed two modules, our
algorithm can automatically determine the accurate local proposal regions and
generate more robust target representations being invariant to various
appearance variances. Extensive experiments verify that SSANET is competent at
learning better spatial-scale invariant target representations, yielding
superior performance on the fine-grained recognition task on several
benchmarks.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - MacNet: An End-to-End Manifold-Constrained Adaptive Clustering Network for Interpretable Whole Slide Image Classification [9.952997875404634]
Clustering-based approaches can provide explainable decision-making process but suffer from high dimension features and semantically ambiguous centroids.<n>We propose an end-to-end MIL framework that integrates Grassmann re-embedding and manifold adaptive clustering.<n> Experiments on multicentre WSI datasets demonstrate that: 1) our cluster-incorporated model achieves superior performance in both grading accuracy and interpretability; 2) end-to-end learning refines better feature representations and it requires acceptable resources.
arXiv Detail & Related papers (2026-02-16T06:43:36Z) - Semantic-Inductive Attribute Selection for Zero-Shot Learning [4.083977531653519]
We study two complementary feature-selection strategies for Zero-Shot Learning (ZSL)<n>The first adapts embedded feature selection to the demands of ZSL, turning model-driven rankings into meaningful semantic pruning.<n>The second leverages evolutionary computation to directly explore the space of attribute subsets more broadly.
arXiv Detail & Related papers (2025-09-26T15:14:36Z) - A Pseudo Global Fusion Paradigm-Based Cross-View Network for LiDAR-Based Place Recognition [12.93382945887946]
LiDAR-based Place Recognition (LPR) remains a critical task in Embodied Artificial Intelligence (AI) and Autonomous Driving.<n>Existing approaches reduce place recognition to a Euclidean distance-based metric learning task, neglecting the feature space's intrinsic structures and intra-class variances.<n>We propose a novel cross-view network based on an innovative fusion paradigm to address these challenges.
arXiv Detail & Related papers (2025-08-12T13:12:48Z) - Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization [2.733505168507872]
Drone-view Geo-Localization (DVGL) aims to achieve accurate localization of drones by retrieving the most relevant GPS-tagged satellite images.<n>Existing methods heavily rely on strictly pre-paired drone-satellite images for supervised learning.<n>We propose an end-to-end self-supervised learning method with a shallow backbone network.
arXiv Detail & Related papers (2025-02-17T02:53:08Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Distribution Regularized Self-Supervised Learning for Domain Adaptation
of Semantic Segmentation [3.284878354988896]
This paper proposes a pixel-level distribution regularization scheme (DRSL) for self-supervised domain adaptation of semantic segmentation.
In a typical setting, the classification loss forces the semantic segmentation model to greedily learn the representations that capture inter-class variations.
We capture pixel-level intra-class variations through class-aware multi-modal distribution learning.
arXiv Detail & Related papers (2022-06-20T09:52:49Z) - Learning towards Synchronous Network Memorizability and Generalizability
for Continual Segmentation across Multiple Sites [52.84959869494459]
In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites.
Existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites.
This paper aims to tackle the problem of Synchronous Memorizability and Generalizability with a novel proposed SMG-learning framework.
arXiv Detail & Related papers (2022-06-14T13:04:36Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - Align Deep Features for Oriented Object Detection [40.28244152216309]
We propose a single-shot Alignment Network (S$2$A-Net) consisting of two modules: a Feature Alignment Module (FAM) and an Oriented Detection Module (ODM)
The FAM can generate high-quality anchors with an Anchor Refinement Network and adaptively align the convolutional features according to the anchor boxes with a novel Alignment Convolution.
The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy.
arXiv Detail & Related papers (2020-08-21T09:55:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.