Related papers: Exploring the evolution of research topics during the COVID-19 pandemic

Exploring the evolution of research topics during the COVID-19 pandemic

URL: http://arxiv.org/abs/2310.03928v1
Date: Thu, 5 Oct 2023 22:16:41 GMT
Title: Exploring the evolution of research topics during the COVID-19 pandemic
Authors: Francesco Invernici, Anna Bernasconi, Stefano Ceri
Abstract summary: We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
Score: 3.234641429290768
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents - thus suited for effective adaptation to other contexts.

Related papers

Exploring Topic Trends in COVID-19 Research Literature using Non-Negative Matrix Factorization [2.8777530051393314]
We apply topic modeling using Non-Negative Matrix Factorization (NMF) on the COVID-19 Open Research dataset. NMF factorizes the document-term matrix into two non-negative matrices, effectively representing the topics and their distribution across the documents. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape.
arXiv Detail & Related papers (2025-03-23T19:37:52Z)
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature [73.39593644054865]
BIOMEDICA is a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset. Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles. BMCA-CLIP is a suite of CLIP-style models continuously pretrained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.
arXiv Detail & Related papers (2025-01-13T09:58:03Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories. Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap. We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z)
An Information Retrieval and Extraction Tool for Covid-19 Related Papers [0.0]
The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers. Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
arXiv Detail & Related papers (2024-01-20T01:34:50Z)
Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated. Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction. To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z)
COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus. We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z)
CovidExplorer: A Multi-faceted AI-based Search and Visualization Engine for COVID-19 Information [0.0]
We present a multi-faceted AI-based search and visualization engine, CovidExplorer. Our system aims to help researchers understand current state-of-the-art COVID-19 research, identify research articles relevant to their domain, and visualize real-time trends and statistics of COVID-19 cases. In contrast to other existing systems, CovidExplorer also brings in India-specific topical discussions on social media to study different aspects of COVID-19.
arXiv Detail & Related papers (2020-11-30T08:42:13Z)
Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view [11.362549790802483]
We analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020. We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques. Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms.
arXiv Detail & Related papers (2020-08-07T23:39:29Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2 [8.223517872575712]
We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2. Our model provides abstractive and comprehensive information based on keywords extracted from the original articles. Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
arXiv Detail & Related papers (2020-06-03T00:54:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.