Investigating Deep-Learning NLP for Automating the Extraction of
Oncology Efficacy Endpoints from Scientific Literature
- URL: http://arxiv.org/abs/2311.04925v1
- Date: Fri, 3 Nov 2023 14:01:54 GMT
- Title: Investigating Deep-Learning NLP for Automating the Extraction of
Oncology Efficacy Endpoints from Scientific Literature
- Authors: Aline Gendrin-Brokmann, Eden Harrison, Julianne Noveras, Leonidas
Souliotis, Harris Vince, Ines Smit, Francisco Costa, David Milward, Sashka
Dimitrievska, Paul Metcalfe, Emilie Louvet
- Abstract summary: We have developed and optimised a framework to extract efficacy endpoints from text in scientific papers.
Our machine learning model predicts 25 classes associated with efficacy endpoints and leads to high F1 scores.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmarking drug efficacy is a critical step in clinical trial design and
planning. The challenge is that much of the data on efficacy endpoints is
stored in scientific papers in free text form, so extraction of such data is
currently a largely manual task. Our objective is to automate this task as much
as possible. In this study we have developed and optimised a framework to
extract efficacy endpoints from text in scientific papers, using a machine
learning approach. Our machine learning model predicts 25 classes associated
with efficacy endpoints and leads to high F1 scores (harmonic mean of precision
and recall) of 96.4% on the test set, and 93.9% and 93.7% on two case studies.
These methods were evaluated against - and showed strong agreement with -
subject matter experts and show significant promise in the future of automating
the extraction of clinical endpoints from free text. Clinical information
extraction from text data is currently a laborious manual task which scales
poorly and is prone to human error. Demonstrating the ability to extract
efficacy endpoints automatically shows great promise for accelerating clinical
trial design moving forwards.
Related papers
- Artificial Intelligence in Extracting Diagnostic Data from Dental Records [6.132077347366551]
This research addresses the issue of missing structured data in dental records by extracting diagnostic information from unstructured text.
We use advanced AI and NLP methods, leveraging GPT-4 to generate synthetic notes for fine-tuning a RoBERTa model.
We evaluated the model using 120 randomly selected clinical notes from two datasets, demonstrating its improved diagnostic extraction accuracy.
arXiv Detail & Related papers (2024-07-23T04:05:48Z) - Accelerating Clinical Evidence Synthesis with Large Language Models [28.002870749019035]
We introduce TrialMind, a generative artificial intelligence pipeline for facilitating human-AI collaboration.
TrialMind excels across study search, screening, and data extraction tasks.
Human experts favored TrialMind's outputs over GPT-4's in 62.5% to 100% of cases.
arXiv Detail & Related papers (2024-06-25T17:41:52Z) - WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV [0.38084074204911494]
This study aims to leverage state of the art language models to automate generating the "Brief Hospital Course" and "Discharge Instructions" sections of Discharge Summaries.
We investigate how automation can improve documentation accuracy, alleviate clinician burnout, and enhance operational efficacy in healthcare facilities.
arXiv Detail & Related papers (2024-05-18T10:56:45Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Efficient Medical Image Assessment via Self-supervised Learning [27.969767956918503]
High-performance deep learning methods typically rely on large annotated training datasets.
We propose a novel and efficient data assessment strategy to rank the quality of unlabeled medical image data.
Motivated by theoretical implication of SSL embedding space, we leverage a Masked Autoencoder for feature extraction.
arXiv Detail & Related papers (2022-09-28T21:39:00Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Learning for Dose Allocation in Adaptive Clinical Trials with Safety
Constraints [84.09488581365484]
Phase I dose-finding trials are increasingly challenging as the relationship between efficacy and toxicity of new compounds becomes more complex.
Most commonly used methods in practice focus on identifying a Maximum Tolerated Dose (MTD) by learning only from toxicity events.
We present a novel adaptive clinical trial methodology that aims at maximizing the cumulative efficacies while satisfying the toxicity safety constraint with high probability.
arXiv Detail & Related papers (2020-06-09T03:06:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.