DAIL: Dataset-Aware and Invariant Learning for Face Recognition
- URL: http://arxiv.org/abs/2101.05419v1
- Date: Thu, 14 Jan 2021 01:59:52 GMT
- Title: DAIL: Dataset-Aware and Invariant Learning for Face Recognition
- Authors: Gaoang Wang, Lin Chen, Tianqiang Liu, Mingwei He, and Jiebo Luo
- Abstract summary: To achieve good performance in face recognition, a large scale training dataset is usually required.
It is problematic and troublesome to naively combine different datasets due to two major issues.
Naively treating the same person as different classes in different datasets during training will affect back-propagation.
manually cleaning labels may take formidable human efforts, especially when there are millions of images and thousands of identities.
- Score: 67.4903809903022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To achieve good performance in face recognition, a large scale training
dataset is usually required. A simple yet effective way to improve recognition
performance is to use a dataset as large as possible by combining multiple
datasets in the training. However, it is problematic and troublesome to naively
combine different datasets due to two major issues. First, the same person can
possibly appear in different datasets, leading to an identity overlapping issue
between different datasets. Naively treating the same person as different
classes in different datasets during training will affect back-propagation and
generate non-representative embeddings. On the other hand, manually cleaning
labels may take formidable human efforts, especially when there are millions of
images and thousands of identities. Second, different datasets are collected in
different situations and thus will lead to different domain distributions.
Naively combining datasets will make it difficult to learn domain invariant
embeddings across different datasets. In this paper, we propose DAIL:
Dataset-Aware and Invariant Learning to resolve the above-mentioned issues. To
solve the first issue of identity overlapping, we propose a dataset-aware loss
for multi-dataset training by reducing the penalty when the same person appears
in multiple datasets. This can be readily achieved with a modified softmax loss
with a dataset-aware term. To solve the second issue, domain adaptation with
gradient reversal layers is employed for dataset invariant learning. The
proposed approach not only achieves state-of-the-art results on several
commonly used face recognition validation sets, including LFW, CFP-FP, and
AgeDB-30, but also shows great benefit for practical use.
Related papers
- What is different between these datasets? [23.271594219577185]
Two comparable datasets in the same domain may have different distributions.
We propose a suite of interpretable methods (toolbox) for comparing two datasets.
Our methods not only outperform comparable and related approaches in terms of explanation quality and correctness, but also provide actionable, complementary insights to understand and mitigate dataset differences effectively.
arXiv Detail & Related papers (2024-03-08T19:52:39Z) - Combining datasets to increase the number of samples and improve model
fitting [7.4771091238795595]
We propose a novel framework called Combine datasets based on Imputation (ComImp)
In addition, we propose a variant of ComImp that uses Principle Component Analysis (PCA), PCA-ComImp in order to reduce dimension before combining datasets.
Our results indicate that the proposed methods are somewhat similar to transfer learning in that the merge can significantly improve the accuracy of a prediction model on smaller datasets.
arXiv Detail & Related papers (2022-10-11T06:06:37Z) - AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in
Sequential Datasets [13.065880037738108]
AdaptCL is a novel adaptive continual learning method to tackle heterogeneous datasets.
It employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size.
It also utilizes task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting.
arXiv Detail & Related papers (2022-07-22T10:48:06Z) - Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding [137.3719377780593]
A new design (named Detection Hub) is dataset-aware and category-aligned.
It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets.
The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
arXiv Detail & Related papers (2022-06-07T17:59:44Z) - Learning Semantic Segmentation from Multiple Datasets with Label Shifts [101.24334184653355]
This paper proposes UniSeg, an effective approach to automatically train models across multiple datasets with differing label spaces.
Specifically, we propose two losses that account for conflicting and co-occurring labels to achieve better generalization performance in unseen domains.
arXiv Detail & Related papers (2022-02-28T18:55:19Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Multi-domain semantic segmentation with overlapping labels [1.4120796122384087]
We propose a principled method for seamless learning on datasets with overlapping classes based on partial labels and probabilistic loss.
Our method achieves competitive within-dataset and cross-dataset generalization, as well as ability to learn visual concepts which are not separately labeled in any of the training datasets.
arXiv Detail & Related papers (2021-08-25T13:25:41Z) - Unsupervised Pre-training for Person Re-identification [90.98552221699508]
We present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson"
We make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
arXiv Detail & Related papers (2020-12-07T14:48:26Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.