MiVOLO: Multi-input Transformer for Age and Gender Estimation
- URL: http://arxiv.org/abs/2307.04616v2
- Date: Fri, 22 Sep 2023 14:03:08 GMT
- Title: MiVOLO: Multi-input Transformer for Age and Gender Estimation
- Authors: Maksim Kuprashevich and Irina Tolstykh
- Abstract summary: We present MiVOLO, a straightforward approach for age and gender estimation using the latest vision transformer.
Our method integrates both tasks into a unified dual input/output model.
We compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Age and gender recognition in the wild is a highly challenging task: apart
from the variability of conditions, pose complexities, and varying image
quality, there are cases where the face is partially or completely occluded. We
present MiVOLO (Multi Input VOLO), a straightforward approach for age and
gender estimation using the latest vision transformer. Our method integrates
both tasks into a unified dual input/output model, leveraging not only facial
information but also person image data. This improves the generalization
ability of our model and enables it to deliver satisfactory results even when
the face is not visible in the image. To evaluate our proposed model, we
conduct experiments on four popular benchmarks and achieve state-of-the-art
performance, while demonstrating real-time processing capabilities.
Additionally, we introduce a novel benchmark based on images from the Open
Images Dataset. The ground truth annotations for this benchmark have been
meticulously generated by human annotators, resulting in high accuracy answers
due to the smart aggregation of votes. Furthermore, we compare our model's age
recognition performance with human-level accuracy and demonstrate that it
significantly outperforms humans across a majority of age ranges. Finally, we
grant public access to our models, along with the code for validation and
inference. In addition, we provide extra annotations for used datasets and
introduce our new benchmark.
Related papers
- Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - Identity-Preserving Aging of Face Images via Latent Diffusion Models [22.2699253042219]
We propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images.
Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting.
arXiv Detail & Related papers (2023-07-17T15:57:52Z) - Conformer and Blind Noisy Students for Improved Image Quality Assessment [80.57006406834466]
Learning-based approaches for perceptual image quality assessment (IQA) usually require both the distorted and reference image for measuring the perceptual quality accurately.
In this work, we explore the performance of transformer-based full-reference IQA models.
We also propose a method for IQA based on semi-supervised knowledge distillation from full-reference teacher models into blind student models.
arXiv Detail & Related papers (2022-04-27T10:21:08Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Multi-modal Affect Analysis using standardized data within subjects in
the Wild [8.05417723395965]
We introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation.
Our proposed framework can improve estimation accuracy and robustness effectively.
arXiv Detail & Related papers (2021-07-07T04:18:28Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - Age Range Estimation using MTCNN and VGG-Face Model [0.11454121287632513]
Age range estimation using CNN is emerging due to its application in myriad of areas.
A deep CNN model is used for identification of people's age range in our proposed work.
arXiv Detail & Related papers (2021-04-17T15:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.