The Leaf Clinical Trials Corpus: a new resource for query generation
from clinical trial eligibility criteria
- URL: http://arxiv.org/abs/2207.13757v1
- Date: Wed, 27 Jul 2022 19:22:24 GMT
- Title: The Leaf Clinical Trials Corpus: a new resource for query generation
from clinical trial eligibility criteria
- Authors: Nicholas J Dobbins, Tony Mullen, Ozlem Uzuner, Meliha Yetisgen
- Abstract summary: We introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions.
We provide details of our schema, annotation process, corpus quality, and statistics.
- Score: 1.7205106391379026
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Identifying cohorts of patients based on eligibility criteria such as medical
conditions, procedures, and medication use is critical to recruitment for
clinical trials. Such criteria are often most naturally described in free-text,
using language familiar to clinicians and researchers. In order to identify
potential participants at scale, these criteria must first be translated into
queries on clinical databases, which can be labor-intensive and error-prone.
Natural language processing (NLP) methods offer a potential means of such
conversion into database queries automatically. However they must first be
trained and evaluated using corpora which capture clinical trials criteria in
sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT)
corpus, a human-annotated corpus of over 1,000 clinical trial eligibility
criteria descriptions using highly granular structured labels capturing a range
of biomedical phenomena. We provide details of our schema, annotation process,
corpus quality, and statistics. Additionally, we present baseline information
extraction results on this corpus as benchmarks for future work.
Related papers
- Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model [0.7373617024876725]
Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants.
The complex nature of unstructured medical texts presents challenges in efficiently identifying participants.
In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task.
arXiv Detail & Related papers (2024-04-24T20:42:28Z) - Text Classification of Cancer Clinical Trial Eligibility Criteria [3.372747046563984]
We focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness.
Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level.
Our results demonstrate the feasibility of automatically classifying common exclusion criteria.
arXiv Detail & Related papers (2023-09-14T15:59:16Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models.
Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z) - Improving Patient Pre-screening for Clinical Trials: Assisting
Physicians with Large Language Models [0.0]
Large Language Models (LLMs) have shown to perform well for clinical information extraction and clinical reasoning.
This paper investigates the use of InstructGPT to assist physicians in determining eligibility for clinical trials based on a patient's summarised medical profile.
arXiv Detail & Related papers (2023-04-14T21:19:46Z) - LeafAI: query generator for clinical cohort discovery rivaling a human
programmer [4.410832512630809]
We create a system capable of generating data model-agnostic queries.
We also provide novel logical reasoning capabilities for complex clinical trial eligibility criteria.
arXiv Detail & Related papers (2023-04-13T00:34:32Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Clinical trial site matching with improved diversity using fair policy
learning [56.01170456417214]
We learn a model that maps a clinical trial description to a ranked list of potential trial sites.
Unlike existing fairness frameworks, the group membership of each trial site is non-binary.
We propose fairness criteria based on demographic parity to address such a multi-group membership scenario.
arXiv Detail & Related papers (2022-04-13T16:35:28Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.