Inter-class Discrepancy Alignment for Face Recognition
- URL: http://arxiv.org/abs/2103.01559v1
- Date: Tue, 2 Mar 2021 08:20:08 GMT
- Title: Inter-class Discrepancy Alignment for Face Recognition
- Authors: Jiaheng Liu, Yudong Wu, Yichao Wu, Zhenmao Li, Chen Ken, Ding Liang,
Junjie Yan
- Abstract summary: We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
- Score: 55.578063356210144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of face recognition (FR) has witnessed great progress with the
surge of deep learning. Existing methods mainly focus on extracting
discriminative features, and directly compute the cosine or L2 distance by the
point-to-point way without considering the context information. In this study,
we make a key observation that the local con-text represented by the
similarities between the instance and its inter-class neighbors1plays an
important role forFR. Specifically, we attempt to incorporate the local
in-formation in the feature space into the metric, and pro-pose a unified
framework calledInter-class DiscrepancyAlignment(IDA), with two dedicated
modules, Discrepancy Alignment Operator(IDA-DAO) andSupport Set
Estimation(IDA-SSE). IDA-DAO is used to align the similarity scores considering
the discrepancy between the images and its neighbors, which is defined by
adaptive support sets on the hypersphere. For practical inference, it is
difficult to acquire support set during online inference. IDA-SSE can provide
convincing inter-class neighbors by introducing virtual candidate images
generated with GAN. Further-more, we propose the learnable IDA-SSE, which can
implicitly give estimation without the need of any other images in the
evaluation process. The proposed IDA can be incorporated into existing FR
systems seamlessly and efficiently. Extensive experiments demonstrate that this
frame-work can 1) significantly improve the accuracy, and 2) make the model
robust to the face images of various distributions.Without bells and whistles,
our method achieves state-of-the-art performance on multiple standard FR
benchmarks.
Related papers
- ResCLIP: Residual Attention for Training-free Dense Vision-language Inference [27.551367463011008]
Cross-correlation of self-attention in CLIP's non-final layers also exhibits localization properties.
We propose the Residual Cross-correlation Self-attention (RCS) module, which leverages the cross-correlation self-attention from intermediate layers to remold the attention in the final block.
The RCS module effectively reorganizes spatial information, unleashing the localization potential within CLIP for dense vision-language inference.
arXiv Detail & Related papers (2024-11-24T14:14:14Z) - Dense Affinity Matching for Few-Shot Segmentation [83.65203917246745]
Few-Shot (FSS) aims to segment the novel class images with a few samples.
We propose a dense affinity matching framework to exploit the support-query interaction.
We show that our framework performs very competitively under different settings with only 0.68M parameters.
arXiv Detail & Related papers (2023-07-17T12:27:15Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Temporal Transductive Inference for Few-Shot Video Object Segmentation [27.140141181513425]
Few-shot object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training.
Key to our approach is the use of both global and local temporal constraints.
Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%.
arXiv Detail & Related papers (2022-03-27T14:08:30Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Spatial-Scale Aligned Network for Fine-Grained Recognition [42.71878867504503]
Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations.
We propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process.
arXiv Detail & Related papers (2020-01-05T11:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.