Data-driven Discovery of Digital Twins in Biomedical Research
- URL: http://arxiv.org/abs/2508.21484v2
- Date: Mon, 01 Sep 2025 17:06:38 GMT
- Title: Data-driven Discovery of Digital Twins in Biomedical Research
- Authors: Clémence Métayer, Annabelle Ballesta, Julien Martinelli,
- Abstract summary: We review methodologies for automatically inferring digital twins from biological time series.<n>We evaluate algorithms according to eight biological and methodological challenges.<n>We highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration.
- Score: 1.1852406625172218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.
Related papers
- An Interdisciplinary and Cross-Task Review on Missing Data Imputation [25.19716862601082]
Missing data is a fundamental challenge in data science, hindering analysis and decision-making across a wide range of disciplines.<n>Despite decades of research and numerous imputation methods, the literature remains fragmented across fields.<n>This work systematically reviews core concepts including missingness mechanisms, single versus multiple imputation, and different imputation goals.
arXiv Detail & Related papers (2025-11-03T03:43:43Z) - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [221.34650992288505]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - Deep Learning in Single-Cell and Spatial Transcriptomics Data Analysis: Advances and Challenges from a Data Science Perspective [19.655130697247518]
The development of single-cell and spatial transcriptomics has revolutionized our capacity to investigate cellular properties, functions, and interactions.<n>However, the analysis of single-cell and spatial omics data remains challenging.<n>Deep learning has emerged as a powerful tool capable of handling high-dimensional complex data and automatically identifying meaningful patterns.
arXiv Detail & Related papers (2024-12-04T14:07:11Z) - Simplicity within biological complexity [0.0]
We survey the literature and argue for the development of a comprehensive framework for embedding of multi-scale molecular network data.
Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships.
We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation.
arXiv Detail & Related papers (2024-05-15T13:32:45Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - AI-Aristotle: A Physics-Informed framework for Systems Biology Gray-Box
Identification [1.8434042562191815]
We present a new framework for parameter estimation and missing physics identification (gray-box) in Systems Biology.
The proposed framework -- named AI-Aristotle -- combines eXtreme Theory of Functional Connections (X-TFC) domain-decomposition and Physics-Informed Neural Networks (PINNs)
We test the accuracy, speed, flexibility and robustness of AI-Aristotle based on two benchmark problems in Systems Biology.
arXiv Detail & Related papers (2023-09-29T14:45:51Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Physically constrained neural networks to solve the inverse problem for
neuron models [0.29005223064604074]
Systems biology and systems neurophysiology are powerful tools for a number of key applications in the biomedical sciences.
Recent developments in the field of deep neural networks have demonstrated the possibility of formulating nonlinear, universal approximators.
arXiv Detail & Related papers (2022-09-24T12:51:15Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.