Disentanglement for Discriminative Visual Recognition
- URL: http://arxiv.org/abs/2006.07810v1
- Date: Sun, 14 Jun 2020 06:10:51 GMT
- Title: Disentanglement for Discriminative Visual Recognition
- Authors: Xiaofeng Liu
- Abstract summary: This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation.
The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework.
The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition.
- Score: 7.954325638519141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent successes of deep learning-based recognition rely on maintaining the
content related to the main-task label. However, how to explicitly dispel the
noisy signals for better generalization in a controllable manner remains an
open issue. For instance, various factors such as identity-specific attributes,
pose, illumination and expression affect the appearance of face images.
Disentangling the identity-specific factors is potentially beneficial for
facial expression recognition (FER). This chapter systematically summarize the
detrimental factors as task-relevant/irrelevant semantic variations and
unspecified latent variation. In this chapter, these problems are casted as
either a deep metric learning problem or an adversarial minimax game in the
latent space. For the former choice, a generalized adaptive (N+M)-tuplet
clusters loss function together with the identity-aware hard-negative mining
and online positive mining scheme can be used for identity-invariant FER. The
better FER performance can be achieved by combining the deep metric loss and
softmax loss in a unified two fully connected layer branches framework via
joint optimization. For the latter solution, it is possible to equipping an
end-to-end conditional adversarial network with the ability to decompose an
input sample into three complementary parts. The discriminative representation
inherits the desired invariance property guided by prior knowledge of the task,
which is marginal independent to the task-relevant/irrelevant semantic and
latent variations. The framework achieves top performance on a serial of tasks,
including lighting, makeup, disguise-tolerant face recognition and facial
attributes recognition. This chapter systematically summarize the popular and
practical solution for disentanglement to achieve more discriminative visual
recognition.
Related papers
- Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification [17.285526655788274]
Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities.
Existing methods generally try to bridge the cross-modal differences at image or feature level.
We introduce a dynamic identity-guided attention network (DIAN) to mine identity-guided and modality-consistent embeddings.
arXiv Detail & Related papers (2024-05-21T12:04:56Z) - Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification [5.592360872268223]
Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task.
Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features.
We propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific.
arXiv Detail & Related papers (2024-03-18T12:12:45Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Set-Based Face Recognition Beyond Disentanglement: Burstiness
Suppression With Variance Vocabulary [78.203301910422]
We argue that the two crucial issues in SFR, the face quality and burstiness, are both identity-irrelevant and variance-relevant.
We propose a light-weighted set-based disentanglement framework to separate the identity features with the variance features.
To suppress face burstiness in the sets, we propose a vocabulary-based burst suppression (VBS) method.
arXiv Detail & Related papers (2023-04-13T04:02:58Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Joint Discriminative and Metric Embedding Learning for Person
Re-Identification [8.137833258504381]
Person re-identification is a challenging task because of the high intra-class variance induced by the unrestricted nuisance factors of variations.
Recent approaches postulate that powerful architectures have the capacity to learn feature representations invariant to nuisance factors.
arXiv Detail & Related papers (2022-12-28T22:08:42Z) - TransFA: Transformer-based Representation for Face Attribute Evaluation [87.09529826340304]
We propose a novel textbftransformer-based representation for textbfattribute evaluation method (textbfTransFA)
The proposed TransFA achieves superior performances compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-07-12T10:58:06Z) - Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition
using Unit-Class Loss and Cross-Modality Discriminator [0.43748379918040853]
We propose an end-to-end framework for cross-modal face recognition.
A novel Unit-Class Loss is proposed for preserving identity information while discarding modality information.
The proposed network can be used to extract modality-independent vector representations or a matching-pair classification for test images.
arXiv Detail & Related papers (2021-11-29T06:14:00Z) - Can contrastive learning avoid shortcut solutions? [88.249082564465]
implicit feature modification (IFM) is a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features.
IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.
arXiv Detail & Related papers (2021-06-21T16:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.