Generating synthetic multi-dimensional molecular-mediator time series
data for artificial intelligence-based disease trajectory forecasting and
drug development digital twins: Considerations
- URL: http://arxiv.org/abs/2303.09056v1
- Date: Thu, 16 Mar 2023 03:13:53 GMT
- Title: Generating synthetic multi-dimensional molecular-mediator time series
data for artificial intelligence-based disease trajectory forecasting and
drug development digital twins: Considerations
- Authors: Gary An and Chase Cockrell
- Abstract summary: The use of synthetic data is recognized as a crucial step in the development of neural network-based Artificial Intelligence (AI) systems.
Insufficiency of statistical and data-centric machine learning means of generating this type of synthetic data is due to a combination of factors.
The generation of synthetic data that accounts for the identified factors of multi-dimensional time series data is an essential capability for the development of mediator-biomarker based AI forecasting systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The use of synthetic data is recognized as a crucial step in the development
of neural network-based Artificial Intelligence (AI) systems. While the methods
for generating synthetic data for AI applications in other domains have a role
in certain biomedical AI systems, primarily related to image processing, there
is a critical gap in the generation of time series data for AI tasks where it
is necessary to know how the system works. This is most pronounced in the
ability to generate synthetic multi-dimensional molecular time series data
(SMMTSD); this is the type of data that underpins research into biomarkers and
mediator signatures for forecasting various diseases and is an essential
component of the drug development pipeline. We argue the insufficiency of
statistical and data-centric machine learning (ML) means of generating this
type of synthetic data is due to a combination of factors: perpetual data
sparsity due to the Curse of Dimensionality, the inapplicability of the Central
Limit Theorem, and the limits imposed by the Causal Hierarchy Theorem.
Alternatively, we present a rationale for using complex multi-scale
mechanism-based simulation models, constructed and operated on to account for
epistemic incompleteness and the need to provide maximal expansiveness in
concordance with the Principle of Maximal Entropy. These procedures provide for
the generation of SMMTD that minimizes the known shortcomings associated with
neural network AI systems, namely overfitting and lack of generalizability. The
generation of synthetic data that accounts for the identified factors of
multi-dimensional time series data is an essential capability for the
development of mediator-biomarker based AI forecasting systems, and therapeutic
control development and optimization through systems like Drug Development
Digital Twins.
Related papers
- Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns [0.0]
Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide.
The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation.
The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach.
However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools.
arXiv Detail & Related papers (2024-11-15T17:32:01Z) - Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Advancing fNIRS Neuroimaging through Synthetic Data Generation and Machine Learning Applications [0.0]
This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging.
By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset.
A cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data.
arXiv Detail & Related papers (2024-05-18T09:50:19Z) - A Comparative Study of Machine Learning Models Predicting Energetics of Interacting Defects [5.574191640970887]
We present a comparative study of three different methods to predict the free energy change of systems with interacting defects.
Our findings indicate that the cluster expansion model can achieve precise energetics predictions even with this limited dataset.
This research provide a preliminary evaluation of applying machine learning techniques in imperfect surface systems.
arXiv Detail & Related papers (2024-03-20T02:15:48Z) - Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Self-learning locally-optimal hypertuning using maximum entropy, and
comparison of machine learning approaches for estimating fatigue life in
composite materials [0.0]
We develop an ML nearest-neighbors-alike algorithm based on the principle of maximum entropy to predict fatigue damage.
The predictions achieve a good level of accuracy, similar to other ML algorithms.
arXiv Detail & Related papers (2022-10-19T12:20:07Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.