Exploring the evolution of research topics during the COVID-19 pandemic
- URL: http://arxiv.org/abs/2310.03928v1
- Date: Thu, 5 Oct 2023 22:16:41 GMT
- Title: Exploring the evolution of research topics during the COVID-19 pandemic
- Authors: Francesco Invernici, Anna Bernasconi, Stefano Ceri
- Abstract summary: We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
- Score: 3.234641429290768
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The COVID-19 pandemic has changed the research agendas of most scientific
communities, resulting in an overwhelming production of research articles in a
variety of domains, including medicine, virology, epidemiology, economy,
psychology, and so on. Several open-access corpora and literature hubs were
established; among them, the COVID-19 Open Research Dataset (CORD-19) has
systematically gathered scientific contributions for 2.5 years, by collecting
and indexing over one million articles. Here, we present the CORD-19 Topic
Visualizer (CORToViz), a method and associated visualization tool for
inspecting the CORD-19 textual corpus of scientific abstracts. Our method is
based upon a careful selection of up-to-date technologies (including large
language models), resulting in an architecture for clustering articles along
orthogonal dimensions and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast,
one-click visualization of topic contents as word clouds and topic trends as
time series, equipped with easy-to-drive statistical testing for analyzing the
significance of topic emergence along arbitrarily selected time windows. The
processes of data preparation and results visualization are completely general
and virtually applicable to any corpus of textual documents - thus suited for
effective adaptation to other contexts.
Related papers
- BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature [73.39593644054865]
BIOMEDICA is a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset.
Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles.
BMCA-CLIP is a suite of CLIP-style models continuously pretrained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.
arXiv Detail & Related papers (2025-01-13T09:58:03Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.
We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.
Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery [65.16724941038052]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.
Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.
We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - An Information Retrieval and Extraction Tool for Covid-19 Related Papers [0.0]
The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers.
Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
arXiv Detail & Related papers (2024-01-20T01:34:50Z) - Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated.
Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction.
To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z) - COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z) - CovidExplorer: A Multi-faceted AI-based Search and Visualization Engine
for COVID-19 Information [0.0]
We present a multi-faceted AI-based search and visualization engine, CovidExplorer.
Our system aims to help researchers understand current state-of-the-art COVID-19 research, identify research articles relevant to their domain, and visualize real-time trends and statistics of COVID-19 cases.
In contrast to other existing systems, CovidExplorer also brings in India-specific topical discussions on social media to study different aspects of COVID-19.
arXiv Detail & Related papers (2020-11-30T08:42:13Z) - Navigating the landscape of COVID-19 research through literature
analysis: A bird's eye view [11.362549790802483]
We analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020.
We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques.
Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms.
arXiv Detail & Related papers (2020-08-07T23:39:29Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Automatic Text Summarization of COVID-19 Medical Research Articles using
BERT and GPT-2 [8.223517872575712]
We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2.
Our model provides abstractive and comprehensive information based on keywords extracted from the original articles.
Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
arXiv Detail & Related papers (2020-06-03T00:54:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.