TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets
- URL: http://arxiv.org/abs/2407.00631v3
- Date: Sun, 15 Jun 2025 22:48:24 GMT
- Title: TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets
- Authors: Jintai Chen, Yaojun Hu, Mingchen Cai, Yingzhou Lu, Yue Wang, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Yuqiang Li, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu,
- Abstract summary: This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
- Score: 54.98321887435557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical trials are pivotal for developing new medical treatments but typically carry risks such as patient mortality and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to predict key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex data collection and question definition requiring medical expertise have hindered the involvement of AI thus far. This paper tackles these challenges by presenting a comprehensive suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design, encompassing prediction of trial duration, patient dropout rate, serious adverse event, mortality rate, trial approval outcome, trial failure reason, drug dose finding, design of eligibility criteria. Furthermore, we provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development.
Related papers
- Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates [1.7099366779394252]
Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase.<n>We propose a novel deep learning-based method to address this critical challenge.<n>We show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial.
arXiv Detail & Related papers (2025-07-31T14:47:16Z) - Leveraging Generative AI Through Prompt Engineering and Rigorous Validation to Create Comprehensive Synthetic Datasets for AI Training in Healthcare [0.0]
The GPT-4 API was employed to generate high-quality synthetic datasets aimed at overcoming this limitation.
The generated data encompassed a comprehensive array of patient admission information, including healthcare provider details, hospital departments, wards, bed assignments, patient demographics, emergency contacts, vital signs, immunizations, allergies, medical histories, appointments, hospital visits, laboratory tests, diagnoses, treatment plans, medications, clinical notes, visit logs, discharge summaries, and referrals.
arXiv Detail & Related papers (2025-04-29T16:37:34Z) - Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption.
Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z) - Can artificial intelligence predict clinical trial outcomes? [5.326858857564308]
This study evaluates the predictive capabilities of large language models (LLMs) in determining clinical trial outcomes.
We compare the models' performance using metrics including balanced accuracy, specificity, recall, and Matthews Correlation Coefficient (MCC)
Oncology trials, characterized by high complexity, remain challenging for all models.
arXiv Detail & Related papers (2024-11-26T17:05:27Z) - Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation [16.067841125848688]
We introduce a novel Retrieval-Reasoning framework that leverages large language models to generate synthetic clinical trials.
Experiments conducted on real clinical trials from the urlClinicalTrials.gov database demonstrate that our synthetic data can effectively augment real datasets.
Our findings suggest that LLMs for synthetic clinical trial generation hold promise for accelerating clinical research and upholding ethical standards for patient privacy.
arXiv Detail & Related papers (2024-10-16T11:46:32Z) - TrialSynth: Generation of Synthetic Sequential Clinical Trial Data [21.799655542003677]
Variational Autoencoder (VAE) designed to address challenges of generating synthetic time-sequence clinical trial data.
Our experiments demonstrate that Trial Synth surpasses the performance of other comparable methods.
arXiv Detail & Related papers (2024-09-11T08:20:30Z) - Language Interaction Network for Clinical Trial Approval Estimation [37.60098683485169]
We introduce the Language Interaction Network (LINT), a novel approach that predicts trial outcomes using only the free-text descriptions of the trials.
We have rigorously tested LINT across three phases of clinical trials, where it achieved ROC-AUC scores of 0.770, 0.740, and 0.748.
arXiv Detail & Related papers (2024-04-26T14:50:59Z) - TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction [19.084936647082632]
We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data.
We encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding.
Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models.
arXiv Detail & Related papers (2024-04-20T02:12:59Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with
Meta-Learning [67.8195828626489]
Clinical trials are essential to drug development but time-consuming, costly, and prone to failure.
We propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics.
With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates.
arXiv Detail & Related papers (2023-04-07T23:04:27Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.