Understanding Clinical Trial Reports: Extracting Medical Entities and
Their Relations
- URL: http://arxiv.org/abs/2010.03550v3
- Date: Fri, 7 Jan 2022 19:23:22 GMT
- Title: Understanding Clinical Trial Reports: Extracting Medical Entities and
Their Relations
- Authors: Benjamin E. Nye, Jay DeYoung, Eric Lehman, Ani Nenkova, Iain J.
Marshall, Byron C. Wallace
- Abstract summary: Medical experts must manually extract information from articles to inform decision-making.
We consider the end-to-end task of both (a) extracting treatments and outcomes from full-text articles describing clinical trials (entity identification) and (b) inferring the reported results for the former with respect to the latter.
- Score: 33.30381080306156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The best evidence concerning comparative treatment effectiveness comes from
clinical trials, the results of which are reported in unstructured articles.
Medical experts must manually extract information from articles to inform
decision-making, which is time-consuming and expensive. Here we consider the
end-to-end task of both (a) extracting treatments and outcomes from full-text
articles describing clinical trials (entity identification) and, (b) inferring
the reported results for the former with respect to the latter (relation
extraction). We introduce new data for this task, and evaluate models that have
recently achieved state-of-the-art results on similar tasks in Natural Language
Processing. We then propose a new method motivated by how trial results are
typically presented that outperforms these purely data-driven baselines.
Finally, we run a fielded evaluation of the model with a non-profit seeking to
identify existing drugs that might be re-purposed for cancer, showing the
potential utility of end-to-end evidence extraction systems.
Related papers
- Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Jointly Extracting Interventions, Outcomes, and Findings from RCT
Reports with LLMs [21.868871974136884]
We propose and evaluate a text-to-text model built on instruction-tuned Large Language Models.
We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings.
arXiv Detail & Related papers (2023-05-05T16:02:06Z) - Predicting Intervention Approval in Clinical Trials through
Multi-Document Summarization [0.30458514384586405]
We propose a new method to predict the effectiveness of an intervention in a clinical trial.
Our method relies on generating an informative summary from multiple documents available in the literature about the intervention under study.
arXiv Detail & Related papers (2022-04-01T08:45:39Z) - Assessment of contextualised representations in detecting outcome
phrases in clinical trials [14.584741378279316]
We introduce "EBM-COMET", a dataset in which 300 PubMed abstracts are expertly annotated for clinical outcomes.
To extract outcomes, we fine-tune a variety of pre-trained contextualized representations.
We observe our best model (BioBERT) achieve 81.5% F1, 81.3% sensitivity and 98.0% specificity.
arXiv Detail & Related papers (2022-02-13T15:08:00Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time [35.15631358690484]
We introduce Trialstreamer, a living database of clinical trial reports.
The system extracts descriptions of trial participants, the treatments compared in each arm, and which outcomes were measured.
In addition to summarizing individual trials, these extracted data elements allow automatic synthesis of results across many trials on the same topic.
arXiv Detail & Related papers (2020-05-21T19:32:04Z) - Evidence Inference 2.0: More Data, Better Models [22.53884716373888]
The Evidence Inference dataset was recently released to facilitate research toward this end.
This paper collects additional annotations to expand the Evidence Inference dataset by 25%.
The updated corpus, documentation, and code for new baselines and evaluations are available at http://evidence-inference.ebm-nlp.com/.
arXiv Detail & Related papers (2020-05-08T17:16:35Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.