Evidence Inference 2.0: More Data, Better Models
- URL: http://arxiv.org/abs/2005.04177v2
- Date: Thu, 14 May 2020 14:55:33 GMT
- Title: Evidence Inference 2.0: More Data, Better Models
- Authors: Jay DeYoung, Eric Lehman, Ben Nye, Iain J. Marshall, Byron C. Wallace
- Abstract summary: The Evidence Inference dataset was recently released to facilitate research toward this end.
This paper collects additional annotations to expand the Evidence Inference dataset by 25%.
The updated corpus, documentation, and code for new baselines and evaluations are available at http://evidence-inference.ebm-nlp.com/.
- Score: 22.53884716373888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do we most effectively treat a disease or condition? Ideally, we could
consult a database of evidence gleaned from clinical trials to answer such
questions. Unfortunately, no such database exists; clinical trial results are
instead disseminated primarily via lengthy natural language articles. Perusing
all such articles would be prohibitively time-consuming for healthcare
practitioners; they instead tend to depend on manually compiled systematic
reviews of medical literature to inform care.
NLP may speed this process up, and eventually facilitate immediate consult of
published evidence. The Evidence Inference dataset was recently released to
facilitate research toward this end. This task entails inferring the
comparative performance of two treatments, with respect to a given outcome,
from a particular article (describing a clinical trial) and identifying
supporting evidence. For instance: Does this article report that chemotherapy
performed better than surgery for five-year survival rates of operable cancers?
In this paper, we collect additional annotations to expand the Evidence
Inference dataset by 25\%, provide stronger baseline models, systematically
inspect the errors that these make, and probe dataset quality. We also release
an abstract only (as opposed to full-texts) version of the task for rapid model
prototyping. The updated corpus, documentation, and code for new baselines and
evaluations are available at http://evidence-inference.ebm-nlp.com/.
Related papers
- Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information.
We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets.
Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z) - Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence [0.12277343096128711]
We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims.
We propose a novel system that can generate synthetic medical claims to aid each of these core tasks.
arXiv Detail & Related papers (2024-05-18T07:50:43Z) - NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial
Reports [3.0468533447146244]
We present a novel resource to advance research on NLI for reasoning on clinical trial reports.
We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks.
To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs.
arXiv Detail & Related papers (2023-05-05T15:03:01Z) - SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with
Meta-Learning [67.8195828626489]
Clinical trials are essential to drug development but time-consuming, costly, and prone to failure.
We propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics.
With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates.
arXiv Detail & Related papers (2023-04-07T23:04:27Z) - MS2: Multi-Document Summarization of Medical Studies [11.38740406132287]
We release MS2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature.
This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies.
We experiment with a summarization system based on BART, with promising early results.
arXiv Detail & Related papers (2021-04-13T19:59:34Z) - CREATe: Clinical Report Extraction and Annotation Technology [53.731999072534876]
Clinical case reports are written descriptions of the unique aspects of a particular clinical case.
There has been no attempt to develop an end-to-end system to annotate, index, or otherwise curate these reports.
We propose a novel computational resource platform, CREATe, for extracting, indexing, and querying the contents of clinical case reports.
arXiv Detail & Related papers (2021-02-28T16:50:14Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Understanding Clinical Trial Reports: Extracting Medical Entities and
Their Relations [33.30381080306156]
Medical experts must manually extract information from articles to inform decision-making.
We consider the end-to-end task of both (a) extracting treatments and outcomes from full-text articles describing clinical trials (entity identification) and (b) inferring the reported results for the former with respect to the latter.
arXiv Detail & Related papers (2020-10-07T17:50:58Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Extracting Structured Data from Physician-Patient Conversations By
Predicting Noteworthy Utterances [39.888619005843246]
We describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels.
One methodological challenge is that the conversations are long (around 1500 words) making it difficult for modern deep-learning models to use them as input.
We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
arXiv Detail & Related papers (2020-07-14T16:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.