PyTrial: Machine Learning Software and Benchmark for Clinical Trial
Applications
- URL: http://arxiv.org/abs/2306.04018v2
- Date: Thu, 5 Oct 2023 05:55:10 GMT
- Title: PyTrial: Machine Learning Software and Benchmark for Clinical Trial
Applications
- Authors: Zifeng Wang and Brandon Theodorou and Tianfan Fu and Cao Xiao and
Jimeng Sun
- Abstract summary: PyTrial provides benchmarks and open-source implementations of a series of machine learning algorithms for clinical trial design and operations.
We thoroughly investigate 34 ML algorithms for clinical trials across 6 different tasks, including patient outcome prediction, trial site selection, trial outcome prediction, patient-trial matching, trial similarity search, and synthetic data generation.
PyTrial defines each task through a simple four-step process: data loading, model specification, model training, and model evaluation, all achievable with just a few lines of code.
- Score: 49.69824178329405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical trials are conducted to test the effectiveness and safety of
potential drugs in humans for regulatory approval. Machine learning (ML) has
recently emerged as a new tool to assist in clinical trials. Despite this
progress, there have been few efforts to document and benchmark ML4Trial
algorithms available to the ML research community. Additionally, the
accessibility to clinical trial-related datasets is limited, and there is a
lack of well-defined clinical tasks to facilitate the development of new
algorithms.
To fill this gap, we have developed PyTrial that provides benchmarks and
open-source implementations of a series of ML algorithms for clinical trial
design and operations. In this paper, we thoroughly investigate 34 ML
algorithms for clinical trials across 6 different tasks, including patient
outcome prediction, trial site selection, trial outcome prediction,
patient-trial matching, trial similarity search, and synthetic data generation.
We have also collected and prepared 23 ML-ready datasets as well as their
working examples in Jupyter Notebooks for quick implementation and testing.
PyTrial defines each task through a simple four-step process: data loading,
model specification, model training, and model evaluation, all achievable with
just a few lines of code. Furthermore, our modular API architecture empowers
practitioners to expand the framework to incorporate new algorithms and tasks
effortlessly. The code is available at https://github.com/RyanWangZf/PyTrial.
Related papers
- Can Large Language Models Replace Data Scientists in Clinical Research? [28.211990967264818]
We develop a dataset consisting of 293 real-world data science coding tasks.
This dataset simulates realistic clinical research scenarios using patient data.
We develop a platform that integrates large language models into the data science workflow for medical professionals.
arXiv Detail & Related papers (2024-10-28T22:48:06Z) - Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding [5.279406017862076]
The challenge of summarising a hospital course remains an open area for further research and development.
We adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task.
The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning.
arXiv Detail & Related papers (2024-09-23T00:35:23Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML [0.7982607013768545]
Yet Another ICU Benchmark (YAIB) is a modular framework that allows researchers to define reproducible and comparable clinical ML experiments.
YAIB supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets.
We demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance.
arXiv Detail & Related papers (2023-06-08T11:16:20Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with
Meta-Learning [67.8195828626489]
Clinical trials are essential to drug development but time-consuming, costly, and prone to failure.
We propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics.
With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates.
arXiv Detail & Related papers (2023-04-07T23:04:27Z) - Artificial Intelligence for In Silico Clinical Trials: A Review [41.85196749088317]
In silico trials are clinical trials conducted digitally through simulation and modeling.
This article systematically reviews papers under three main topics: clinical simulation, individualized predictive modeling, and computer-aided trial design.
arXiv Detail & Related papers (2022-09-16T14:59:31Z) - PosePipe: Open-Source Human Pose Estimation Pipeline for Clinical
Research [0.0]
We develop a human pose estimation pipeline that facilitates running state-of-the-art algorithms on data acquired in clinical context.
Our goal in this work is not to train new algorithms, but to advance the use of cutting-edge human pose estimation algorithms for clinical and translation research.
arXiv Detail & Related papers (2022-03-16T17:54:37Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.