Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case
Study using Latent Dirichlet Allocation Method
- URL: http://arxiv.org/abs/2301.03029v6
- Date: Tue, 18 Apr 2023 16:43:11 GMT
- Title: Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case
Study using Latent Dirichlet Allocation Method
- Authors: Bernadeta Grici\=ut\.e and Lifeng Han and Goran Nenadic
- Abstract summary: Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP)
In this study, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus.
We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021.
- Score: 8.405827390095064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Topic Modelling (TM) is from the research branches of natural language
understanding (NLU) and natural language processing (NLP) that is to facilitate
insightful analysis from large documents and datasets, such as a summarisation
of main topics and the topic changes. This kind of discovery is getting more
popular in real-life applications due to its impact on big data analytics. In
this study, from the social-media and healthcare domain, we apply popular
Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish
newspaper articles about Coronavirus. We describe the corpus we created
including 6515 articles, methods applied, and statistics on topic changes over
approximately 1 year and two months period of time from 17th January 2020 to
13th March 2021. We hope this work can be an asset for grounding applications
of topic modelling and can be inspiring for similar case studies in an era with
pandemics, to support socio-economic impact research as well as clinical and
healthcare analytics. Our data and source code are openly available at
https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation
(LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding;
BERT-topic
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [139.69207791947738]
Dolma is a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
We document Dolma, including its design principles, details about its construction, and a summary of its contents.
We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices.
arXiv Detail & Related papers (2024-01-31T20:29:50Z) - Discovering Mental Health Research Topics with Topic Modeling [13.651763262606782]
This study aims to identify general trends in the field and pinpoint high-impact research topics by analyzing a large dataset of mental health research papers.
Our dataset comprises 96,676 research papers pertaining to mental health, enabling us to examine the relationships between different topics using their abstracts.
To enhance our analysis, we also generated word clouds to provide a comprehensive overview of the machine learning models applied in mental health research.
arXiv Detail & Related papers (2023-08-25T05:25:05Z) - A Data-driven Latent Semantic Analysis for Automatic Text Summarization
using LDA Topic Modelling [0.0]
This study presents the Latent Dirichlet Allocation (LDA) approach used to perform topic modelling.
The visualisation provides an overarching view of the main topics while allowing and attributing deep meaning to the prevalence individual topic.
The results suggest the terms ranked purely by considering their probability of the topic prevalence within the processed document.
arXiv Detail & Related papers (2022-07-23T11:04:03Z) - COVID-19 Literature Mining and Retrieval using Text Mining Approaches [0.0]
The novel coronavirus disease (COVID-19) began in Wuhan, China, in late 2019 and to date has infected over 148M people worldwide.
Many academicians and researchers started to publish papers describing the latest discoveries on covid-19.
The proposed model attempts to extract relavent titles from the large corpus of research publications.
arXiv Detail & Related papers (2022-05-29T22:34:19Z) - Neural language models for text classification in evidence-based
medicine [3.5770353345663044]
Evidence-based medicine (EBM) is being challenged as never before due to the high volume of research articles published and pre-prints posted daily.
In this article, we report the results of an applied research project to classify scientific articles to support Epistemonikos.
We test several methods, and the best one, based on the XLNet neural language model, improves the current approach by 93% on average F1-score.
arXiv Detail & Related papers (2020-12-01T15:53:44Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - A Cross-lingual Natural Language Processing Framework for Infodemic
Management [0.6606016007748989]
The COVID-19 pandemic has put immense pressure on health systems which are further strained due to misinformation surrounding it.
We have exploited the potential of Natural Language Processing for identifying relevant information that needs to be disseminated amongst the masses.
We present a novel Cross-lingual Natural Language Processing framework to provide relevant information by matching daily news with trusted guidelines from the World Health Organization.
arXiv Detail & Related papers (2020-10-30T16:26:35Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.