Related papers: Understanding the Representation and Representativeness of Age in AI Data Sets

Understanding the Representation and Representativeness of Age in AI Data Sets

URL: http://arxiv.org/abs/2103.09058v2
Date: Thu, 6 May 2021 04:30:40 GMT
Title: Understanding the Representation and Representativeness of Age in AI Data Sets
Authors: Joon Sung Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, Meredith Ringel Morris
Abstract summary: We ask whether older adults are represented proportionally to the population at large in AI data sets. We find that older adults are very under-represented; five data sets explicitly documented the closed age intervals of their subjects. We find that only 24 of the data sets include any age-related information in their documentation or metadata.
Score: 43.20868863618351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects' ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider.

Related papers

OPEN: A Benchmark Dataset and Baseline for Older Adult Patient Engagement Recognition in Virtual Rehabilitation Learning Environments [1.9827390755712084]
This paper introduces OPEN (Older adult Patient ENgagement), a novel dataset supporting AI-driven engagement recognition.<n>It was collected from eleven older adults participating in weekly virtual group learning sessions over six weeks as part of cardiac rehabilitation.<n>To demonstrate utility, multiple machine learning and deep learning models were trained, achieving engagement recognition accuracy of up to 81 percent.
arXiv Detail & Related papers (2025-07-23T22:03:29Z)
Bridging the gap in FER: addressing age bias in deep learning [0.562479170374811]
We study age-related bias in deep FER models, with a particular focus on the elderly population.<n>Using Explainable AI (XAI) techniques, we identify systematic disparities in expression recognition and attention patterns.<n>Results show consistent improvements in recognition accuracy for elderly individuals.
arXiv Detail & Related papers (2025-07-10T11:07:13Z)
Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust [7.985473318714565]
We ask, even if character distribution along demographic dimensions are available, how useful are they to the general public?<n>Our work addresses these questions through a user study, while proposing a new AI-based character representation and visualization tool.<n>Our tool based on the Contrastive Language Image Pretraining (CLIP) foundation model to analyze visual screen data to quantify character representation across dimensions of age and gender.
arXiv Detail & Related papers (2025-06-02T13:46:28Z)
Experimenting with Affective Computing Models in Video Interviews with Spanish-speaking Older Adults [2.4866182704905495]
This study evaluates state-of-the-art affective computing models using videos of older adults interacting with either a person or a virtual avatar. As part of this effort, we introduce a novel dataset featuring Spanish-speaking older adults engaged in human-to-human video interviews.
arXiv Detail & Related papers (2025-01-28T11:42:15Z)
TextAge: A Curated and Diverse Text Dataset for Age Classification [1.4843200329335289]
Age-related language patterns play a crucial role in understanding linguistic differences and developing age-appropriate communication strategies. We present TextAge, a curated text dataset that maps sentences to the age and age group of the producer. The dataset undergoes extensive cleaning and preprocessing to ensure data quality and consistency.
arXiv Detail & Related papers (2024-05-02T23:37:03Z)
Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection. Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z)
Explaining machine learning models for age classification in human gait analysis [10.570744839131775]
The research question was: Which input features are used by ML models to classify age-related differences in walking patterns? We utilized a subset of the AIST Gait Database 2019 containing five bilateral ground reaction force (GRF) recordings per person during barefoot walking of healthy participants. The mean classification accuracy of 60.1% was clearly higher than the zero-rule baseline of 37.3%. The confusion matrix shows that the CNN distinguished younger and older adults well, but had difficulty modeling the middle-aged adults.
arXiv Detail & Related papers (2022-10-16T13:53:51Z)
Data Representativeness in Accessibility Datasets: A Meta-Analysis [7.6597163467929805]
We review datasets sourced by people with disabilities and older adults. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. We hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
arXiv Detail & Related papers (2022-07-16T23:32:19Z)
LAE : Long-tailed Age Estimation [52.5745217752147]
We first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the proposed one significantly decreases the estimation errors. We propose a two-stage training method named Long-tailed Age Estimation (LAE), which decouples the learning procedure into representation learning and classification.
arXiv Detail & Related papers (2021-10-25T09:05:44Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race. We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns. We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z)
Investigating Bias in Deep Face Analysis: The KANFace Dataset and Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date. The data are manually annotated in terms of identity, exact age, gender and kinship. A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.