Related papers: Inter-class Discrepancy Alignment for Face Recognition

Inter-class Discrepancy Alignment for Face Recognition

URL: http://arxiv.org/abs/2103.01559v1
Date: Tue, 2 Mar 2021 08:20:08 GMT
Title: Inter-class Discrepancy Alignment for Face Recognition
Authors: Jiaheng Liu, Yudong Wu, Yichao Wu, Zhenmao Li, Chen Ken, Ding Liang, Junjie Yan
Abstract summary: We propose a unified framework calledInter-class DiscrepancyAlignment(IDA) IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors. IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
Score: 55.578063356210144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The field of face recognition (FR) has witnessed great progress with the surge of deep learning. Existing methods mainly focus on extracting discriminative features, and directly compute the cosine or L2 distance by the point-to-point way without considering the context information. In this study, we make a key observation that the local con-text represented by the similarities between the instance and its inter-class neighbors1plays an important role forFR. Specifically, we attempt to incorporate the local in-formation in the feature space into the metric, and pro-pose a unified framework calledInter-class DiscrepancyAlignment(IDA), with two dedicated modules, Discrepancy Alignment Operator(IDA-DAO) andSupport Set Estimation(IDA-SSE). IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors, which is defined by adaptive support sets on the hypersphere. For practical inference, it is difficult to acquire support set during online inference. IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN. Further-more, we propose the learnable IDA-SSE, which can implicitly give estimation without the need of any other images in the evaluation process. The proposed IDA can be incorporated into existing FR systems seamlessly and efficiently. Extensive experiments demonstrate that this frame-work can 1) significantly improve the accuracy, and 2) make the model robust to the face images of various distributions.Without bells and whistles, our method achieves state-of-the-art performance on multiple standard FR benchmarks.

Related papers

Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation [74.55677741919035]
We propose Prior2Former (P2F) as the first approach for segmentation vision transformers rooted in evidential learning. P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments. It achieves the highest ranking in the OoDIS anomaly instance benchmark among methods not using OOD data in any way.
arXiv Detail & Related papers (2025-04-07T08:53:14Z)
RAU: Towards Regularized Alignment and Uniformity for Representation Learning in Recommendation [7.193305599721105]
We propose Regularized Alignment and Uniformity (RAU) to cope with sparse alignment and uneven uniformity issues. RAU consists of two novel regularization methods for alignment and uniformity to learn better user/item representation.
arXiv Detail & Related papers (2025-03-24T03:03:21Z)
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference [27.551367463011008]
Cross-correlation of self-attention in CLIP's non-final layers also exhibits localization properties. We propose the Residual Cross-correlation Self-attention (RCS) module, which leverages the cross-correlation self-attention from intermediate layers to remold the attention in the final block. The RCS module effectively reorganizes spatial information, unleashing the localization potential within CLIP for dense vision-language inference.
arXiv Detail & Related papers (2024-11-24T14:14:14Z)
Dense Affinity Matching for Few-Shot Segmentation [83.65203917246745]
Few-Shot (FSS) aims to segment the novel class images with a few samples. We propose a dense affinity matching framework to exploit the support-query interaction. We show that our framework performs very competitively under different settings with only 0.68M parameters.
arXiv Detail & Related papers (2023-07-17T12:27:15Z)
Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples. We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric. Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z)
Temporal Transductive Inference for Few-Shot Video Object Segmentation [27.140141181513425]
Few-shot object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. Key to our approach is the use of both global and local temporal constraints. Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%.
arXiv Detail & Related papers (2022-03-27T14:08:30Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains. Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z)
Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy. We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z)
Spatial-Scale Aligned Network for Fine-Grained Recognition [42.71878867504503]
Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations. We propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process.
arXiv Detail & Related papers (2020-01-05T11:12:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.