Entity-driven Fact-aware Abstractive Summarization of Biomedical
Literature
- URL: http://arxiv.org/abs/2203.15959v1
- Date: Wed, 30 Mar 2022 00:34:56 GMT
- Title: Entity-driven Fact-aware Abstractive Summarization of Biomedical
Literature
- Authors: Amanuel Alambo, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael
Raymer
- Abstract summary: We propose an entity-driven fact-aware framework for training end-to-end transformer-based encoder-decoder models for abstractive summarization of biomedical articles.
We conduct experiments using five state-of-the-art transformer-based models.
The proposed approach is evaluated on ICD-11-Summ-1000, and PubMed-50k.
- Score: 3.977582258550673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As part of the large number of scientific articles being published every
year, the publication rate of biomedical literature has been increasing.
Consequently, there has been considerable effort to harness and summarize the
massive amount of biomedical research articles. While transformer-based
encoder-decoder models in a vanilla source document-to-summary setting have
been extensively studied for abstractive summarization in different domains,
their major limitations continue to be entity hallucination (a phenomenon where
generated summaries constitute entities not related to or present in source
article(s)) and factual inconsistency. This problem is exacerbated in a
biomedical setting where named entities and their semantics (which can be
captured through a knowledge base) constitute the essence of an article. The
use of named entities and facts mined from background knowledge bases
pertaining to the named entities to guide abstractive summarization has not
been studied in biomedical article summarization literature. In this paper, we
propose an entity-driven fact-aware framework for training end-to-end
transformer-based encoder-decoder models for abstractive summarization of
biomedical articles. We call the proposed approach, whose building block is a
transformer-based model, EFAS, Entity-driven Fact-aware Abstractive
Summarization. We conduct experiments using five state-of-the-art
transformer-based models (two of which are specifically designed for long
document summarization) and demonstrate that injecting knowledge into the
training/inference phase of these models enables the models to achieve
significantly better performance than the standard source document-to-summary
setting in terms of entity-level factual accuracy, N-gram novelty, and semantic
equivalence while performing comparably on ROUGE metrics. The proposed approach
is evaluated on ICD-11-Summ-1000, and PubMed-50k.
Related papers
- High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models [1.9665865095034865]
We formulate the relation extraction task as binary classifications for large language models.
We designate the main title as the tail entity and explicitly incorporate it into the context.
Longer contents are sliced into text chunks, embedded, and retrieved with additional embedding models.
arXiv Detail & Related papers (2023-12-13T16:43:41Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - The SourceData-NLP dataset: integrating curation into scientific
publishing for training large language models [1.0423199374671421]
We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process.
This dataset contains more than 620,000 annotated biomedical entities, curated from 18,689 figures in 3,223 papers in molecular and cell biology.
arXiv Detail & Related papers (2023-10-31T13:22:38Z) - Improving Biomedical Abstractive Summarisation with Knowledge
Aggregation from Citation Papers [24.481854035628434]
Existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts.
We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers.
Our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
arXiv Detail & Related papers (2023-10-24T09:56:46Z) - Readability Controllable Biomedical Document Summarization [17.166794984161964]
We introduce a new task of readability controllable summarization for biomedical documents.
It aims to recognise users' readability demands and generate summaries that better suit their needs.
arXiv Detail & Related papers (2022-10-10T14:03:20Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot.
We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z) - Generating (Factual?) Narrative Summaries of RCTs: Experiments with
Neural Multi-Document Summarization [22.611879349101596]
We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews.
We find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual.
arXiv Detail & Related papers (2020-08-25T22:22:50Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.