Topic modelling discourse dynamics in historical newspapers
- URL: http://arxiv.org/abs/2011.10428v1
- Date: Fri, 20 Nov 2020 14:51:07 GMT
- Title: Topic modelling discourse dynamics in historical newspapers
- Authors: Jani Marjanen, Elaine Zosa, Simon Hengchen, Lidia Pivovarova, Mikko
Tolonen
- Abstract summary: We apply two families of topic models (LDA and DTM) on a relatively large set of historical newspapers in Finland.
Our case study focuses on newspapers and periodicals published in Finland between 1854 and 1917, but our method can easily be transposed to any diachronic data.
- Score: 2.978993130750125
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper addresses methodological issues in diachronic data analysis for
historical research. We apply two families of topic models (LDA and DTM) on a
relatively large set of historical newspapers, with the aim of capturing and
understanding discourse dynamics. Our case study focuses on newspapers and
periodicals published in Finland between 1854 and 1917, but our method can
easily be transposed to any diachronic data. Our main contributions are a) a
combined sampling, training and inference procedure for applying topic models
to huge and imbalanced diachronic text collections; b) a discussion on the
differences between two topic models for this type of data; c) quantifying
topic prominence for a period and thus a generalization of document-wise topic
assignment to a discourse level; and d) a discussion of the role of humanistic
interpretation with regard to analysing discourse dynamics through topic
models.
Related papers
- Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling [1.4322802933929257]
Our study focuses on articles published between 1955 and 2018, specifically examining discourse on nuclear power and nuclear safety.<n>We analyze various topic distributions across the corpus and trace their temporal evolution to uncover long-term trends and shifts in public discourse.<n>This enables us to more accurately explore patterns in public discourse, including the co-occurrence of themes related to nuclear power and nuclear weapons and their shifts in topic importance over time.
arXiv Detail & Related papers (2025-12-12T15:15:02Z) - Improving Topic Modeling of Social Media Short Texts with Rephrasing: A Case Study of COVID-19 Related Tweets [2.073927793507761]
Shortness, informality, and noise of social media short texts often hinder the effectiveness of traditional topic modeling.<n>We have developed emphTM-Rephrase, a model-agnostic framework that rephrases raw tweets into more standardized and formal language prior to topic modeling.<n>This study contributes to a model-agnostic approach to enhancing topic modeling in public health related social media analysis.
arXiv Detail & Related papers (2025-10-21T03:29:38Z) - Embedded Topic Models Enhanced by Wikification [3.082729239227955]
We incorporate the Wikipedia knowledge into a neural topic model to make it aware of named entities.
Our experiments show that our method improves the performance of neural topic models in generalizability.
arXiv Detail & Related papers (2024-10-03T12:39:14Z) - Investigating the Impact of Text Summarization on Topic Modeling [13.581341206178525]
In this paper, an approach is proposed that further enhances topic modeling performance by utilizing a pre-trained large language model (LLM)
Few shot prompting is used to generate summaries of different lengths to compare their impact on topic modeling.
The proposed method yields better topic diversity and comparable coherence values compared to previous models.
arXiv Detail & Related papers (2024-09-28T19:45:45Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling [10.589126787499973]
Existing topic models often struggle to accurately capture the underlying semantic patterns of short texts.
This paper introduces a novel model-agnostic mechanism, termed Topic Refinement.
We show that Topic Refinement boosts the topic quality and improves the performance in topic-related text classification tasks.
arXiv Detail & Related papers (2024-03-26T13:50:34Z) - Decoding Multilingual Topic Dynamics and Trend Identification through ARIMA Time Series Analysis on Social Networks: A Novel Data Translation Framework Enhanced by LDA/HDP Models [0.08246494848934444]
We focus on dialogues within Tunisian social networks during the Coronavirus Pandemic and other notable themes like sports and politics.
We start by aggregating a varied multilingual corpus of comments relevant to these subjects.
We then introduce our No-English-to-English Machine Translation approach to handle linguistic differences.
arXiv Detail & Related papers (2024-03-18T00:01:10Z) - Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - A Topic Coverage Approach to Evaluation of Topic Models [0.0]
We investigate an approach to topic model evaluation based on measuring topic coverage.
We demonstrate the benefits of the approach by evaluating, in a series of experiments, different types of topic models.
The contributions of the paper include the measures of coverage and the recommendations for the use of topic models for topic discovery.
arXiv Detail & Related papers (2020-12-11T12:08:27Z) - Modeling Topical Relevance for Multi-Turn Dialogue Generation [61.87165077442267]
We propose a new model, named STAR-BTM, to tackle the problem of topic drift in multi-turn dialogue.
The Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context.
Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-09-27T03:33:22Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.