Large Language Models are Zero-Shot Clinical Information Extractors
- URL: http://arxiv.org/abs/2205.12689v1
- Date: Wed, 25 May 2022 11:49:58 GMT
- Title: Large Language Models are Zero-Shot Clinical Information Extractors
- Authors: Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, David Sontag
- Abstract summary: We show that large language models, such as GPT-3, perform well at zero-shot information extraction from clinical text.
We present examples showing how to use these models as tools for the diverse tasks of (i) concept disambiguation, (ii) evidence extraction, (iii) coreference resolution, and (iv) concept extraction.
The key to good performance is the use of simple task-specific programs that map from the language model outputs to the label space of the task.
- Score: 15.907327589436965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that large language models, such as GPT-3, perform well at zero-shot
information extraction from clinical text despite not being trained
specifically for the clinical domain. We present several examples showing how
to use these models as tools for the diverse tasks of (i) concept
disambiguation, (ii) evidence extraction, (iii) coreference resolution, and
(iv) concept extraction, all on clinical text. The key to good performance is
the use of simple task-specific programs that map from the language model
outputs to the label space of the task. We refer to these programs as
resolvers, a generalization of the verbalizer, which defines a mapping between
output tokens and a discrete label space. We show in our examples that good
resolvers share common components (e.g., "safety checks" that ensure the
language model outputs faithfully match the input data), and that the common
patterns across tasks make resolvers lightweight and easy to create. To better
evaluate these systems, we also introduce two new datasets for benchmarking
zero-shot clinical information extraction based on manual relabeling of the
CASI dataset (Moon et al., 2014) with labels for new tasks. On the clinical
extraction tasks we studied, the GPT-3 + resolver systems significantly
outperform existing zero- and few-shot baselines.
Related papers
- Infusing clinical knowledge into tokenisers for language models [1.9921590146992474]
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing.
At initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts.
To avoid pretraining using the new tokeniser, an embedding initialisation approach is proposed to generate representations for new tokens.
arXiv Detail & Related papers (2024-06-20T13:43:03Z) - Probing Representations for Document-level Event Extraction [30.523959637364484]
This work is the first to apply the probing paradigm to representations learned for document-level information extraction.
We designed eight embedding probes to analyze surface, semantic, and event-understanding capabilities relevant to document-level event extraction.
We found that trained encoders from these models yield embeddings that can modestly improve argument detections and labeling but only slightly enhance event-level tasks.
arXiv Detail & Related papers (2023-10-23T19:33:04Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Open-Vocabulary Object Detection using Pseudo Caption Labels [3.260777306556596]
We argue that more fine-grained labels are necessary to extract richer knowledge about novel objects.
Our best model trained on the de-duplicated VisualGenome dataset achieves an AP of 34.5 and an APr of 30.6, comparable to the state-of-the-art performance.
arXiv Detail & Related papers (2023-03-23T05:10:22Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Finding Dataset Shortcuts with Grammar Induction [85.47127659108637]
We propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets.
Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs.
The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features.
arXiv Detail & Related papers (2022-10-20T19:54:11Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.