Exploring the evolution of research topics during the COVID-19 pandemic
- URL: http://arxiv.org/abs/2310.03928v1
- Date: Thu, 5 Oct 2023 22:16:41 GMT
- Title: Exploring the evolution of research topics during the COVID-19 pandemic
- Authors: Francesco Invernici, Anna Bernasconi, Stefano Ceri
- Abstract summary: We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
- Score: 3.234641429290768
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The COVID-19 pandemic has changed the research agendas of most scientific
communities, resulting in an overwhelming production of research articles in a
variety of domains, including medicine, virology, epidemiology, economy,
psychology, and so on. Several open-access corpora and literature hubs were
established; among them, the COVID-19 Open Research Dataset (CORD-19) has
systematically gathered scientific contributions for 2.5 years, by collecting
and indexing over one million articles. Here, we present the CORD-19 Topic
Visualizer (CORToViz), a method and associated visualization tool for
inspecting the CORD-19 textual corpus of scientific abstracts. Our method is
based upon a careful selection of up-to-date technologies (including large
language models), resulting in an architecture for clustering articles along
orthogonal dimensions and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast,
one-click visualization of topic contents as word clouds and topic trends as
time series, equipped with easy-to-drive statistical testing for analyzing the
significance of topic emergence along arbitrarily selected time windows. The
processes of data preparation and results visualization are completely general
and virtually applicable to any corpus of textual documents - thus suited for
effective adaptation to other contexts.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.
Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.
We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - An Information Retrieval and Extraction Tool for Covid-19 Related Papers [0.0]
The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers.
Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
arXiv Detail & Related papers (2024-01-20T01:34:50Z) - Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated.
Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction.
To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z) - COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z) - CovidExplorer: A Multi-faceted AI-based Search and Visualization Engine
for COVID-19 Information [0.0]
We present a multi-faceted AI-based search and visualization engine, CovidExplorer.
Our system aims to help researchers understand current state-of-the-art COVID-19 research, identify research articles relevant to their domain, and visualize real-time trends and statistics of COVID-19 cases.
In contrast to other existing systems, CovidExplorer also brings in India-specific topical discussions on social media to study different aspects of COVID-19.
arXiv Detail & Related papers (2020-11-30T08:42:13Z) - Navigating the landscape of COVID-19 research through literature
analysis: A bird's eye view [11.362549790802483]
We analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020.
We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques.
Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms.
arXiv Detail & Related papers (2020-08-07T23:39:29Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Automatic Text Summarization of COVID-19 Medical Research Articles using
BERT and GPT-2 [8.223517872575712]
We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2.
Our model provides abstractive and comprehensive information based on keywords extracted from the original articles.
Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
arXiv Detail & Related papers (2020-06-03T00:54:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.