Automated tabulation of clinical trial results: A joint entity and
relation extraction approach with transformer-based language representations
- URL: http://arxiv.org/abs/2112.05596v1
- Date: Fri, 10 Dec 2021 15:26:43 GMT
- Title: Automated tabulation of clinical trial results: A joint entity and
relation extraction approach with transformer-based language representations
- Authors: Jetsun Whitton and Anthony Hunter
- Abstract summary: This paper investigates automating evidence table generation by decomposing the problem across two language processing tasks.
We focus on the automatic tabulation of sentences from published RCT abstracts that report the practice outcomes.
To train and test these models, a new gold-standard corpus was developed, comprising almost 600 result sentences from six disease areas.
- Score: 5.825190876052148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evidence-based medicine, the practice in which healthcare professionals refer
to the best available evidence when making decisions, forms the foundation of
modern healthcare. However, it relies on labour-intensive systematic reviews,
where domain specialists must aggregate and extract information from thousands
of publications, primarily of randomised controlled trial (RCT) results, into
evidence tables. This paper investigates automating evidence table generation
by decomposing the problem across two language processing tasks: \textit{named
entity recognition}, which identifies key entities within text, such as drug
names, and \textit{relation extraction}, which maps their relationships for
separating them into ordered tuples. We focus on the automatic tabulation of
sentences from published RCT abstracts that report the results of the study
outcomes. Two deep neural net models were developed as part of a joint
extraction pipeline, using the principles of transfer learning and
transformer-based language representations. To train and test these models, a
new gold-standard corpus was developed, comprising almost 600 result sentences
from six disease areas. This approach demonstrated significant advantages, with
our system performing well across multiple natural language processing tasks
and disease areas, as well as in generalising to disease domains unseen during
training. Furthermore, we show these results were achievable through training
our models on as few as 200 example sentences. The final system is a proof of
concept that the generation of evidence tables can be semi-automated,
representing a step towards fully automating systematic reviews.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Advancing Italian Biomedical Information Extraction with
Transformers-based Models: Methodological Insights and Multicenter Practical
Application [0.27027468002793437]
Information Extraction can help clinical practitioners overcome the limitation by using automated text-mining pipelines.
We created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model.
The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach.
arXiv Detail & Related papers (2023-06-08T16:15:46Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - Jointly Extracting Interventions, Outcomes, and Findings from RCT
Reports with LLMs [21.868871974136884]
We propose and evaluate a text-to-text model built on instruction-tuned Large Language Models.
We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings.
arXiv Detail & Related papers (2023-05-05T16:02:06Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - A Unified Framework of Medical Information Annotation and Extraction for
Chinese Clinical Text [1.4841452489515765]
Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques.
This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction.
arXiv Detail & Related papers (2022-03-08T03:19:16Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Med7: a transferable clinical natural language processing model for
electronic health records [6.935142529928062]
We introduce a named-entity recognition model for clinical natural language processing.
The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration.
We evaluate the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK.
arXiv Detail & Related papers (2020-03-03T00:55:43Z) - Data Mining in Clinical Trial Text: Transformers for Classification and
Question Answering Tasks [2.127049691404299]
This research applies advances in natural language processing to evidence synthesis based on medical texts.
The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework.
Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks.
arXiv Detail & Related papers (2020-01-30T11:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.