DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling
- URL: http://arxiv.org/abs/2502.03347v1
- Date: Wed, 05 Feb 2025 16:40:05 GMT
- Title: DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling
- Authors: Matteo Busso, Andrea Bontempelli, Leonardo Javier Malcotti, Lakmal Meegahapola, Peter Kun, Shyam Diwakar, Chaitanya Nutakki, Marcelo Dario Rodas Britez, Hao Xu, Donglei Song, Salvador Ruiz Correa, Andrea-Rebeca Mendoza-Lara, George Gaskell, Sally Stares, Miriam Bidoglia, Amarsanaa Ganbold, Altangerel Chagnaa, Luca Cernuzzi, Alethia Hume, Ronald Chenu-Abente, Roy Alia Asiku, Ivan Kayongo, Daniel Gatica-Perez, Amalia de Götzen, Ivano Bison, Fausto Giunchiglia,
- Abstract summary: We introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks.
As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data.
- Score: 12.289430134399078
- License:
- Abstract: Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. However, many existing datasets are limited in scope, often focusing on specific countries primarily in the Global North, involving a small number of participants, or using a limited range of pre-processed sensors. These limitations restrict the ability to capture cross-country variations of human behavior, including the possibility of studying model generalization, and robustness. To address this gap, we introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks. DiversityOne contains data from 26 smartphone sensor modalities and 350K+ self-reports. As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data. DiversityOne opens the possibility of studying important research problems in ubiquitous computing, particularly in domain adaptation and generalization across countries, all research areas so far largely underexplored because of the lack of adequate datasets.
Related papers
- Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.
Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.
We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition [64.5207572897806]
The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems.
In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals.
The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor dataset.
arXiv Detail & Related papers (2024-06-11T22:26:20Z) - An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification [58.5877965612088]
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques.
The existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios.
We develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features.
arXiv Detail & Related papers (2024-03-22T11:21:51Z) - MyDigitalFootprint: an extensive context dataset for pervasive computing
applications at the edge [7.310043452300736]
MyDigitalFootprint is a large-scale dataset comprising smartphone sensor data, physical proximity information, and Online Social Networks interactions.
It spans two months of measurements from 31 volunteer users in their natural environment, allowing for unrestricted behavior.
To demonstrate the dataset's effectiveness, we present three context-aware applications utilizing various machine learning tasks.
arXiv Detail & Related papers (2023-06-28T07:59:47Z) - Learning About Social Context from Smartphone Data: Generalization
Across Countries and Daily Life Moments [5.764112063319108]
We used a novel, large-scale, and multimodal smartphone sensing dataset with over 216K self-reports collected from 581 young adults in five countries.
Several sensors are informative of social context, that partially personalized multi-country models (trained and tested with data from all countries) and country-specific models (trained and tested within countries) can achieve similar performance above 90% AUC.
These findings confirm the importance of the diversity of mobile data, to better understand social context inference models in different countries.
arXiv Detail & Related papers (2023-06-01T17:20:56Z) - Understanding the Social Context of Eating with Multimodal Smartphone
Sensing: The Role of Country Diversity [5.764112063319108]
This study focuses on a dataset of approximately 24K self-reports on eating events provided by 678 college students in eight countries.
Our analysis revealed that while some smartphone usage features during eating events were similar across countries, others exhibited unique trends in each country.
arXiv Detail & Related papers (2023-06-01T14:16:59Z) - Complex Daily Activities, Country-Level Diversity, and Smartphone
Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK [6.52702503779308]
Smartphones enable understanding human behavior with activity recognition to support people's daily lives.
People are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings.
We analyzed in-the-wild smartphone data and over 216K self-reports from 637 college students in five countries.
arXiv Detail & Related papers (2023-02-16T21:34:55Z) - Learning Language and Multimodal Privacy-Preserving Markers of Mood from
Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is daily smartphone usage.
We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - Multimodal Privacy-preserving Mood Prediction from Mobile Data: A
Preliminary Study [34.550824104906255]
Mental health conditions remain under-diagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is from daily smartphone usage.
We study behavioral markers or daily mood using a recent dataset of mobile behaviors from high-risk adolescent populations.
arXiv Detail & Related papers (2020-12-04T01:44:22Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.