Information Extraction of Clinical Trial Eligibility Criteria
- URL: http://arxiv.org/abs/2006.07296v6
- Date: Tue, 28 Jul 2020 17:50:42 GMT
- Title: Information Extraction of Clinical Trial Eligibility Criteria
- Authors: Yitong Tseo, M. I. Salkola, Ahmed Mohamed, Anuj Kumar, Freddy Abnousi
- Abstract summary: This paper investigates an information extraction (IE) approach for grounding criteria from trials in ClinicalTrials(dot)gov to a shared knowledge base.
We frame the problem as a novel knowledge base population task, and implement a solution combining machine learning and context free grammar.
To our knowledge, this work is the first criteria extraction system to apply attention-based conditional random field architecture for named entity recognition.
- Score: 6.192164049563104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical trials predicate subject eligibility on a diversity of criteria
ranging from patient demographics to food allergies. Trials post their
requirements as semantically complex, unstructured free-text. Formalizing trial
criteria to a computer-interpretable syntax would facilitate eligibility
determination. In this paper, we investigate an information extraction (IE)
approach for grounding criteria from trials in ClinicalTrials(dot)gov to a
shared knowledge base. We frame the problem as a novel knowledge base
population task, and implement a solution combining machine learning and
context free grammar. To our knowledge, this work is the first criteria
extraction system to apply attention-based conditional random field
architecture for named entity recognition (NER), and word2vec embedding
clustering for named entity linking (NEL). We release the resources and core
components of our system on GitHub at
https://github.com/facebookresearch/Clinical-Trial-Parser. Finally, we report
our per module and end to end performances; we conclude that our system is
competitive with Criteria2Query, which we view as the current state-of-the-art
in criteria extraction.
Related papers
- EvalAgent: Discovering Implicit Evaluation Criteria from the Web [82.82096383262068]
We introduce EvalAgent, a framework designed to automatically uncover nuanced and task-specific criteria.
EvalAgent mines expert-authored online guidance to propose diverse, long-tail evaluation criteria.
Our experiments demonstrate that the grounded criteria produced by EvalAgent are often implicit, yet specific.
arXiv Detail & Related papers (2025-04-21T16:43:50Z) - LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation [6.4073053466465835]
Patient matching is the process of linking patients to appropriate clinical trials by accurately identifying and matching their medical records with trial eligibility criteria.
We propose LLM-Match, a novel framework for patient matching leveraging fine-tuned open-source large language models.
We evaluated it on four open datasets - n2c2, SIGIR, TREC 2021, and TREC 2022 - using open-source models, comparing it against TrialGPT, Zero-Shot, and GPT-4-based closed models.
arXiv Detail & Related papers (2025-03-17T15:31:55Z) - Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions [59.28853595868749]
We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations.
We formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials.
Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial.
arXiv Detail & Related papers (2025-03-12T10:17:54Z) - Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition [24.179910886684745]
We exploit additional evidence by making use of domain-specific semantic type dependencies.
In some cases NER effectiveness can be significantly improved by making use of domain-specific semantic type dependencies.
arXiv Detail & Related papers (2025-03-07T12:29:21Z) - Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model [0.7373617024876725]
Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants.
The complex nature of unstructured medical texts presents challenges in efficiently identifying participants.
In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task.
arXiv Detail & Related papers (2024-04-24T20:42:28Z) - AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models.
Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - LeafAI: query generator for clinical cohort discovery rivaling a human
programmer [4.410832512630809]
We create a system capable of generating data model-agnostic queries.
We also provide novel logical reasoning capabilities for complex clinical trial eligibility criteria.
arXiv Detail & Related papers (2023-04-13T00:34:32Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - The Leaf Clinical Trials Corpus: a new resource for query generation
from clinical trial eligibility criteria [1.7205106391379026]
We introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions.
We provide details of our schema, annotation process, corpus quality, and statistics.
arXiv Detail & Related papers (2022-07-27T19:22:24Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Literature Retrieval for Precision Medicine with Neural Matching and
Faceted Summarization [2.978663539080876]
We present a document reranking approach that combines neural query-document matching and text summarization.
Evaluations using NIST's TREC-PM track datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-17T02:01:32Z) - Robust Benchmarking for Machine Learning of Clinical Entity Extraction [2.9398911304923447]
We audit the performance of and indicate areas of improvement for state-of-the-art systems.
We find that high task accuracies for clinical entity normalization systems on the 2019 n2c2 Shared Task are misleading.
We reformulate the annotation framework for clinical entity extraction to factor in inconsistencies in medical vocabularies.
arXiv Detail & Related papers (2020-07-31T15:14:05Z) - COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
We propose CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching.
Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching.
arXiv Detail & Related papers (2020-06-15T21:01:33Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.