MiVOLO: Multi-input Transformer for Age and Gender Estimation
- URL: http://arxiv.org/abs/2307.04616v2
- Date: Fri, 22 Sep 2023 14:03:08 GMT
- Title: MiVOLO: Multi-input Transformer for Age and Gender Estimation
- Authors: Maksim Kuprashevich and Irina Tolstykh
- Abstract summary: We present MiVOLO, a straightforward approach for age and gender estimation using the latest vision transformer.
Our method integrates both tasks into a unified dual input/output model.
We compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Age and gender recognition in the wild is a highly challenging task: apart
from the variability of conditions, pose complexities, and varying image
quality, there are cases where the face is partially or completely occluded. We
present MiVOLO (Multi Input VOLO), a straightforward approach for age and
gender estimation using the latest vision transformer. Our method integrates
both tasks into a unified dual input/output model, leveraging not only facial
information but also person image data. This improves the generalization
ability of our model and enables it to deliver satisfactory results even when
the face is not visible in the image. To evaluate our proposed model, we
conduct experiments on four popular benchmarks and achieve state-of-the-art
performance, while demonstrating real-time processing capabilities.
Additionally, we introduce a novel benchmark based on images from the Open
Images Dataset. The ground truth annotations for this benchmark have been
meticulously generated by human annotators, resulting in high accuracy answers
due to the smart aggregation of votes. Furthermore, we compare our model's age
recognition performance with human-level accuracy and demonstrate that it
significantly outperforms humans across a majority of age ranges. Finally, we
grant public access to our models, along with the code for validation and
inference. In addition, we provide extra annotations for used datasets and
introduce our new benchmark.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - Identity-Preserving Aging of Face Images via Latent Diffusion Models [22.2699253042219]
We propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images.
Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting.
arXiv Detail & Related papers (2023-07-17T15:57:52Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Multi-modal Affect Analysis using standardized data within subjects in
the Wild [8.05417723395965]
We introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation.
Our proposed framework can improve estimation accuracy and robustness effectively.
arXiv Detail & Related papers (2021-07-07T04:18:28Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - Age Range Estimation using MTCNN and VGG-Face Model [0.11454121287632513]
Age range estimation using CNN is emerging due to its application in myriad of areas.
A deep CNN model is used for identification of people's age range in our proposed work.
arXiv Detail & Related papers (2021-04-17T15:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.