Topic Modeling and Progression of American Digital News Media During the
Onset of the COVID-19 Pandemic
- URL: http://arxiv.org/abs/2106.09572v1
- Date: Tue, 25 May 2021 14:27:47 GMT
- Title: Topic Modeling and Progression of American Digital News Media During the
Onset of the COVID-19 Pandemic
- Authors: Xiangpeng Wan, Michael C. Lucic, Hakim Ghazzai, Yehia Massoud
- Abstract summary: Currently, the world is in the midst of a severe global pandemic, which has affected all aspects of people's lives.
There is a deluge of COVID-related digital media articles published in the United States, due to the disparate effects of the pandemic.
We develop a Natural Language Processing pipeline that is capable of automatically distilling various digital articles into manageable pieces of information.
- Score: 2.798697306330988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, the world is in the midst of a severe global pandemic, which has
affected all aspects of people's lives. As a result, there is a deluge of
COVID-related digital media articles published in the United States, due to the
disparate effects of the pandemic. This large volume of information is
difficult to consume by the audience in a reasonable amount of time. In this
paper, we develop a Natural Language Processing (NLP) pipeline that is capable
of automatically distilling various digital articles into manageable pieces of
information, while also modelling the progression topics discussed over time in
order to aid readers in rapidly gaining holistic perspectives on pressing
issues (i.e., the COVID-19 pandemic) from a diverse array of sources. We
achieve these goals by first collecting a large corpus of COVID-related
articles during the onset of the pandemic. After, we apply unsupervised and
semi-supervised learning procedures to summarize articles, then cluster them
based on their similarities using the community detection methods. Next, we
identify the topic of each cluster of articles using the BART algorithm.
Finally, we provide a detailed digital media analysis based on the NLP-pipeline
outputs and show how the conversation surrounding COVID-19 evolved over time.
Related papers
- Automating the Analysis of Public Saliency and Attitudes towards Biodiversity from Digital Media [0.5175667614430115]
Measuring public attitudes toward wildlife provides crucial insights into our relationship with nature and helps monitor progress toward Global Biodiversity Framework targets.
We aim to overcome these challenges by leveraging modern Natural Language Processing (NLP) tools.
We introduce a folk taxonomy approach for improved search term generation and employ cosine similarity on Term Frequency-Inverse Document Frequency vectors to filter syndicated articles.
We also introduce an relevance filtering pipeline which uses unsupervised learning to reveal common topics, followed by an open-source Large Language Model (LLM) to assign topics to news article titles.
arXiv Detail & Related papers (2024-05-02T08:28:25Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - GPT-4V(ision) as A Social Media Analysis Engine [77.23394183063238]
This paper explores GPT-4V's capabilities for social multimedia analysis.
We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection.
GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge.
arXiv Detail & Related papers (2023-11-13T18:36:50Z) - Exploring the evolution of research topics during the COVID-19 pandemic [3.234641429290768]
We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
arXiv Detail & Related papers (2023-10-05T22:16:41Z) - Measuring COVID-19 Related Media Consumption on Twitter [2.746705315038595]
Social media platforms have provided essential updates regarding the pandemic.
Online communications with media outlets remain unexplored on an international scale.
This thesis presents the first-of-its-kind study on media consumption on COVID-19 across countries.
arXiv Detail & Related papers (2023-09-16T04:01:45Z) - Computational Assessment of Hyperpartisanship in News Titles [55.92100606666497]
We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection.
Overall the Right media tends to use proportionally more hyperpartisan titles.
We identify three major topics including foreign issues, political systems, and societal issues that are suggestive of hyperpartisanship in news titles.
arXiv Detail & Related papers (2023-01-16T05:56:58Z) - COVID-19 and Big Data: Multi-faceted Analysis for Spatio-temporal
Understanding of the Pandemic with Social Media Conversations [4.07452542897703]
Social media platforms have served as a vehicle for the global conversation about COVID-19.
We present a framework for analysis, mining, and tracking the critical content and characteristics of social media conversations around the pandemic.
arXiv Detail & Related papers (2021-04-22T00:45:50Z) - VMSMO: Learning to Generate Multimodal Summary for Video-based News
Articles [63.32111010686954]
We propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO)
The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article.
We propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator.
arXiv Detail & Related papers (2020-10-12T02:19:16Z) - Understanding the Spatio-temporal Topic Dynamics of Covid-19 using
Nonnegative Tensor Factorization: A Case Study [1.6328866317851185]
This paper proposes a representation of social media data and Non-negative Factorization (NTF) to identify the topics discussed in social media data.
A case study on the Australia Twittersphere is presented to identify visualize the topic dynamics on and off the Covid-19.
arXiv Detail & Related papers (2020-09-19T15:16:28Z) - A System for Worldwide COVID-19 Information Aggregation [92.60866520230803]
We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics.
A neural machine translation module translates articles in other languages into Japanese and English.
A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently.
arXiv Detail & Related papers (2020-07-28T01:33:54Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.