Slot Filling for Biomedical Information Extraction
- URL: http://arxiv.org/abs/2109.08564v1
- Date: Fri, 17 Sep 2021 14:16:00 GMT
- Title: Slot Filling for Biomedical Information Extraction
- Authors: Yannis Papanikolaou, Francine Bennett
- Abstract summary: We present a slot filling approach to the task of biomedical IE.
We follow the proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model.
- Score: 0.5330240017302619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information Extraction (IE) from text refers to the task of extracting
structured knowledge from unstructured text. The task typically consists of a
series of sub-tasks such as Named Entity Recognition and Relation Extraction.
Sourcing entity and relation type specific training data is a major bottleneck
in the above sub-tasks.In this work we present a slot filling approach to the
task of biomedical IE, effectively replacing the need for entity and
relation-specific training data, allowing to deal with zero-shot settings. We
follow the recently proposed paradigm of coupling a Tranformer-based
bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model to
extract relations from biomedical text. We assemble a biomedical slot filling
dataset for both retrieval and reading comprehension and conduct a series of
experiments demonstrating that our approach outperforms a number of simpler
baselines. We also evaluate our approach end-to-end for standard as well as
zero-shot settings. Our work provides a fresh perspective on how to solve
biomedical IE tasks, in the absence of relevant training data. Our code, models
and pretrained data are available at
https://github.com/healx/biomed-slot-filling.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Into the Single Cell Multiverse: an End-to-End Dataset for Procedural
Knowledge Extraction in Biomedical Texts [2.2578044590557553]
FlaMB'e is a collection of expert-curated datasets that capture procedural knowledge in biomedical texts.
The dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology.
arXiv Detail & Related papers (2023-09-04T21:02:36Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - Iteratively Improving Biomedical Entity Linking and Event Extraction via
Hard Expectation-Maximization [9.422435686239538]
Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain.
Previous research typically solves these two tasks separately or in a pipeline, leading to error propagation.
We propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables.
arXiv Detail & Related papers (2023-05-24T02:30:31Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - BigBIO: A Framework for Data-Centric Biomedical Natural Language
Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets.
BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata.
We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - BERT WEAVER: Using WEight AVERaging to enable lifelong learning for
transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model.
We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z) - Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework.
We formalize the task as a translation between task-specific input text and output triples.
We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z) - PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts using
Transfer Learning [0.0]
PharmKE is a text analysis platform that applies deep learning through several stages for thorough semantic analysis of pharmaceutical articles.
The methodology is used to create accurately labeled training and test datasets, which are then used to train models for custom entity labeling tasks.
The obtained results are compared to the fine-tuned BERT and BioBERT models trained on the same dataset.
arXiv Detail & Related papers (2021-02-25T19:36:35Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.