Automatic Text Summarization of COVID-19 Medical Research Articles using
BERT and GPT-2
- URL: http://arxiv.org/abs/2006.01997v1
- Date: Wed, 3 Jun 2020 00:54:44 GMT
- Title: Automatic Text Summarization of COVID-19 Medical Research Articles using
BERT and GPT-2
- Authors: Virapat Kieuvongngam, Bowen Tan, Yiming Niu
- Abstract summary: We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2.
Our model provides abstractive and comprehensive information based on keywords extracted from the original articles.
Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
- Score: 8.223517872575712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the COVID-19 pandemic, there is a growing urgency for medical community
to keep up with the accelerating growth in the new coronavirus-related
literature. As a result, the COVID-19 Open Research Dataset Challenge has
released a corpus of scholarly articles and is calling for machine learning
approaches to help bridging the gap between the researchers and the rapidly
growing publications. Here, we take advantage of the recent advances in
pre-trained NLP models, BERT and OpenAI GPT-2, to solve this challenge by
performing text summarization on this dataset. We evaluate the results using
ROUGE scores and visual inspection. Our model provides abstractive and
comprehensive information based on keywords extracted from the original
articles. Our work can help the the medical community, by providing succinct
summaries of articles for which the abstract are not already available.
Related papers
- Exploring the evolution of research topics during the COVID-19 pandemic [3.234641429290768]
We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
arXiv Detail & Related papers (2023-10-05T22:16:41Z) - covLLM: Large Language Models for COVID-19 Biomedical Literature [0.0]
The COVID-19 pandemic led to 1.1 million deaths in the United States, despite the explosion of coronavirus research.
One reason is that clinicians, overwhelmed by patients, struggle to keep pace with the rate of new coronavirus literature.
A potential solution is developing a tool for evaluating coronavirus literature using large language models.
arXiv Detail & Related papers (2023-06-08T04:08:32Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z) - An Analysis of a BERT Deep Learning Strategy on a Technology Assisted
Review Task [91.3755431537592]
Document screening is a central task within Evidenced Based Medicine.
I propose a DL document classification approach with BERT or PubMedBERT embeddings and a DL similarity search path.
I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections.
arXiv Detail & Related papers (2021-04-16T19:45:27Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - COVID-19 therapy target discovery with context-aware literature mining [5.839799877302573]
We propose a system for contextualization of empirical expression data by approximating relations between entities.
In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique.
arXiv Detail & Related papers (2020-07-30T18:37:36Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - Visualising COVID-19 Research [4.664989082015335]
We develop a novel automated theme-based visualisation method.
It combines advanced data modelling of large corpora, information mapping and trend analysis.
It provides a top-down and bottom-up browsing and search interface for quick discovery of topics and research resources.
arXiv Detail & Related papers (2020-05-13T15:45:14Z) - Mapping the Landscape of Artificial Intelligence Applications against
COVID-19 [59.30734371401316]
COVID-19, the disease caused by the SARS-CoV-2 virus, has been declared a pandemic by the World Health Organization.
We present an overview of recent studies using Machine Learning and, more broadly, Artificial Intelligence to tackle many aspects of the COVID-19 crisis.
arXiv Detail & Related papers (2020-03-25T12:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.