Related papers: A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement

A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement

URL: http://arxiv.org/abs/2509.19088v3
Date: Fri, 07 Nov 2025 06:14:27 GMT
Title: A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement
Authors: Tianyi Peng, George Gui, Daniel J. Merlau, Grace Jiarui Fan, Malek Ben Sliman, Melanie Brucks, Eric J. Johnson, Vicki Morwitz, Abdullah Althenayyan, Silvia Bellezza, Dante Donati, Hortense Fong, Elizabeth Friedman, Ariana Guevara, Mohamed Hussein, Kinshuk Jerath, Bruce Kogut, Akshit Kumar, Kristen Lane, Hannah Li, Patryk Perkowski, Oded Netzer, Olivier Toubia,
Abstract summary: Digital representations of individuals ("digital twins") promise to transform social science and decision-making.<n>We conducted 19 studies with a representative U.S. panel and their digital twins.<n>Twins reproduced individual responses with 75% accuracy and seemingly low correlation with human answers.
Score: 3.418816254588274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Digital representations of individuals ("digital twins") promise to transform social science and decision-making. Yet it remains unclear whether such twins truly mirror the people they emulate. We conducted 19 preregistered studies with a representative U.S. panel and their digital twins, each constructed from rich individual-level data, enabling direct comparisons between human and twin behavior across a wide range of domains and stimuli (including never-seen-before ones). Twins reproduced individual responses with 75% accuracy and seemingly low correlation with human answers (approximately 0.2). However, this apparently high accuracy was no higher than that achieved by generic personas based on demographics only. In contrast, correlation improved when twins incorporated detailed personal information, even outperforming traditional machine learning benchmarks that require additional data. Twins exhibited systematic strengths and weaknesses - performing better in social and personality domains, but worse in political ones - and were more accurate for participants with higher education, higher income, and moderate political views and religious attendance. Together, these findings delineate both the promise and the current limits of digital twins: they capture some relative differences among individuals but not yet the unique judgments of specific people. All data and code are publicly available to support the further development and evaluation of digital twin pipelines.

Related papers

Psychometric Comparability of LLM-Based Digital Twins [2.7740826124350355]
We benchmark digital twins against human gold standards across models, tasks and testing how person-specific inputs shape performance.<n>Across studies, digital twins achieved high population-level accuracy and strong within-participant profile correlations.<n>Digital twins under-reproduce biases, showing normative prediction, compressed variance and limited sensitivity to temporal information.
arXiv Detail & Related papers (2025-12-22T18:04:27Z)
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection [47.554732714656296]
This study identifies a critical issue-"Toxic Siblings" bias-which hinders the interaction decoder's learning.<n>This bias arises from high confusion among sibling triplets/categories, where increased similarity paradoxically reduces precision.<n>We propose two novel debiasing learning objectives-"contrastive-then-calibration" and "merge-then-split"
arXiv Detail & Related papers (2025-08-31T09:23:15Z)
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis [74.31705485094096]
We introduce TalkVid, a new large-scale, high-quality, and diverse dataset containing 1244 hours of video from 7729 unique speakers.<n>TalkVid is curated through a principled, multi-stage automated pipeline that rigorously filters for motion stability, aesthetic quality, and facial detail.<n>We construct and release TalkVid-Bench, a stratified evaluation set of 500 clips meticulously balanced across key demographic and linguistic axes.
arXiv Detail & Related papers (2025-08-19T08:31:15Z)
Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions [11.751234495886674]
LLM-based digital twin simulation holds great promise for research in AI, social science, and digital experimentation.<n>We survey a representative sample of $N = 2,058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total.<n>Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels.
arXiv Detail & Related papers (2025-05-23T05:05:11Z)
Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages.<n>These were combined with real images to train classification models for embryo cell stage prediction.<n>Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z)
Digital Twin Generators for Disease Modeling [2.341540989979203]
A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health.
arXiv Detail & Related papers (2024-05-02T17:23:04Z)
Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation [0.0]
The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma.
arXiv Detail & Related papers (2024-04-23T18:09:53Z)
TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model [24.35626029582016]
We propose a large language model-based digital twin creation approach, called TWIN-GPT. We show that using digital twins created by TWIN-GPT can boost the clinical trial outcome prediction.
arXiv Detail & Related papers (2024-04-01T17:48:55Z)
Digital Twins: How Far from Ideas to Twins? [0.0]
Ideas have been proposed theoretical and practical for digital twins. From theoretical perspective, digital twin is fusion of data mapping between modalities. From practical point of view, digital twin is scenario implementation based on the Internet of Things and models.
arXiv Detail & Related papers (2024-03-16T01:25:59Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts. We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z)
Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS" Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images. Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches. We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z)
Digital Twins: State of the Art Theory and Practice, Challenges, and Open Research Questions [62.67593386796497]
This work explores the various DT features and current approaches, the shortcomings and reasons behind the delay in the implementation and adoption of digital twin. The major reasons for this delay are the lack of a universal reference framework, domain dependence, security concerns of shared data, reliance of digital twin on other technologies, and lack of quantitative metrics.
arXiv Detail & Related papers (2020-11-02T19:08:49Z)
Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions. We make robust and efficient counterfactual predictions for both individual and average treatment effects. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week's Activities [56.1344233010643]
Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout. This study aims to predict dropout early-on, from the first week, by comparing several machine-learning approaches.
arXiv Detail & Related papers (2020-08-12T10:44:49Z)
Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data. Our test requires essentially no assumptions on the distributions. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.