TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence
Generation with Biomedical Language Models
- URL: http://arxiv.org/abs/2311.01301v2
- Date: Mon, 6 Nov 2023 11:29:30 GMT
- Title: TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence
Generation with Biomedical Language Models
- Authors: Javier Gonz\'alez, Cliff Wong, Zelalem Gero, Jass Bagga, Risa Ueno,
Isabel Chien, Eduard Oravkin, Emre Kiciman, Aditya Nori, Roshanthi
Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco,
Hoifung Poon
- Abstract summary: We present TRIALSCOPE, a unifying framework for distilling real-world evidence from observational data.
We show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials.
- Score: 22.046231408373522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid digitization of real-world data offers an unprecedented opportunity
for optimizing healthcare delivery and accelerating biomedical discovery. In
practice, however, such data is most abundantly available in unstructured
forms, such as clinical notes in electronic medical records (EMRs), and it is
generally plagued by confounders. In this paper, we present TRIALSCOPE, a
unifying framework for distilling real-world evidence from population-level
observational data. TRIALSCOPE leverages biomedical language models to
structure clinical text at scale, employs advanced probabilistic modeling for
denoising and imputation, and incorporates state-of-the-art causal inference
techniques to combat common confounders. Using clinical trial specification as
generic representation, TRIALSCOPE provides a turn-key solution to generate and
reason with clinical hypotheses using observational data. In extensive
experiments and analyses on a large-scale real-world dataset with over one
million cancer patients from a large US healthcare network, we show that
TRIALSCOPE can produce high-quality structuring of real-world data and
generates comparable results to marquee cancer trials. In addition to
facilitating in-silicon clinical trial design and optimization, TRIALSCOPE may
be used to empower synthetic controls, pragmatic trials, post-market
surveillance, as well as support fine-grained patient-like-me reasoning in
precision diagnosis and treatment.
Related papers
- DALL-M: Context-Aware Clinical Data Augmentation with LLMs [13.827368628263997]
We present a novel technique to enhance the clinical context through augmentation techniques with clinical data.
We introduce a pioneering approach to clinical data augmentation that employs large language models (LLMs) to generate patient contextual synthetic data.
This methodology is crucial for training more robust deep learning models in healthcare.
arXiv Detail & Related papers (2024-07-11T07:01:50Z) - PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models [4.438101430231511]
We present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs.
Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials.
arXiv Detail & Related papers (2024-04-23T22:33:19Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - SoftTiger: A Clinical Foundation Model for Healthcare Workflows [5.181665205189493]
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare.
We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter.
We supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data.
arXiv Detail & Related papers (2024-03-01T04:39:16Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks.
ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN)
Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z) - TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of
Clinical Trials [0.0]
We introduce a curated clinical trial data set compiled from the CT.gov, AACT and TrialTrove databases (n=1191 trials; representing one million patients)
We then detail the mathematical basis and implementation of a selection of graph machine learning algorithms.
We trained these models to predict side effect information for a clinical trial given information on the disease, existing medical conditions, and treatment.
arXiv Detail & Related papers (2021-12-15T15:36:57Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.