Entity-driven Fact-aware Abstractive Summarization of Biomedical
Literature
- URL: http://arxiv.org/abs/2203.15959v1
- Date: Wed, 30 Mar 2022 00:34:56 GMT
- Title: Entity-driven Fact-aware Abstractive Summarization of Biomedical
Literature
- Authors: Amanuel Alambo, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael
Raymer
- Abstract summary: We propose an entity-driven fact-aware framework for training end-to-end transformer-based encoder-decoder models for abstractive summarization of biomedical articles.
We conduct experiments using five state-of-the-art transformer-based models.
The proposed approach is evaluated on ICD-11-Summ-1000, and PubMed-50k.
- Score: 3.977582258550673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As part of the large number of scientific articles being published every
year, the publication rate of biomedical literature has been increasing.
Consequently, there has been considerable effort to harness and summarize the
massive amount of biomedical research articles. While transformer-based
encoder-decoder models in a vanilla source document-to-summary setting have
been extensively studied for abstractive summarization in different domains,
their major limitations continue to be entity hallucination (a phenomenon where
generated summaries constitute entities not related to or present in source
article(s)) and factual inconsistency. This problem is exacerbated in a
biomedical setting where named entities and their semantics (which can be
captured through a knowledge base) constitute the essence of an article. The
use of named entities and facts mined from background knowledge bases
pertaining to the named entities to guide abstractive summarization has not
been studied in biomedical article summarization literature. In this paper, we
propose an entity-driven fact-aware framework for training end-to-end
transformer-based encoder-decoder models for abstractive summarization of
biomedical articles. We call the proposed approach, whose building block is a
transformer-based model, EFAS, Entity-driven Fact-aware Abstractive
Summarization. We conduct experiments using five state-of-the-art
transformer-based models (two of which are specifically designed for long
document summarization) and demonstrate that injecting knowledge into the
training/inference phase of these models enables the models to achieve
significantly better performance than the standard source document-to-summary
setting in terms of entity-level factual accuracy, N-gram novelty, and semantic
equivalence while performing comparably on ROUGE metrics. The proposed approach
is evaluated on ICD-11-Summ-1000, and PubMed-50k.
Related papers
- NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models [1.9665865095034865]
We formulate the relation extraction task as binary classifications for large language models.
We designate the main title as the tail entity and explicitly incorporate it into the context.
Longer contents are sliced into text chunks, embedded, and retrieved with additional embedding models.
arXiv Detail & Related papers (2023-12-13T16:43:41Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - Integrating curation into scientific publishing to train AI models [1.6982459897303823]
We have embedded multimodal data curation into the academic publishing process to annotate segmented figure panels and captions.
The dataset, SourceData-NLP, contains more than 620,000 annotated biomedical entities.
We evaluate the utility of the dataset to train AI models using named-entity recognition, segmentation of figure captions into their constituent panels, and a novel context-dependent semantic task.
arXiv Detail & Related papers (2023-10-31T13:22:38Z) - Improving Biomedical Abstractive Summarisation with Knowledge
Aggregation from Citation Papers [24.481854035628434]
Existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts.
We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers.
Our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
arXiv Detail & Related papers (2023-10-24T09:56:46Z) - Readability Controllable Biomedical Document Summarization [17.166794984161964]
We introduce a new task of readability controllable summarization for biomedical documents.
It aims to recognise users' readability demands and generate summaries that better suit their needs.
arXiv Detail & Related papers (2022-10-10T14:03:20Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.