Bidirectional Topic Matching: Quantifying Thematic Overlap Between Corpora Through Topic Modelling
- URL: http://arxiv.org/abs/2412.18376v1
- Date: Tue, 24 Dec 2024 12:02:43 GMT
- Title: Bidirectional Topic Matching: Quantifying Thematic Overlap Between Corpora Through Topic Modelling
- Authors: Raven Adam, Marie Lisa Kogler,
- Abstract summary: Bidirectional Topic Matching (BTM) is a novel method for cross-corpus topic modeling that quantifies thematic overlap and divergence between corpora.
BTM employs a dual-model approach, training separate topic models for each corpus and applying them reciprocally to enable comprehensive cross-corpus comparisons.
- Score: 0.0
- License:
- Abstract: This study introduces Bidirectional Topic Matching (BTM), a novel method for cross-corpus topic modeling that quantifies thematic overlap and divergence between corpora. BTM is a flexible framework that can incorporate various topic modeling approaches, including BERTopic, Top2Vec, and Latent Dirichlet Allocation (LDA). BTM employs a dual-model approach, training separate topic models for each corpus and applying them reciprocally to enable comprehensive cross-corpus comparisons. This methodology facilitates the identification of shared themes and unique topics, providing nuanced insights into thematic relationships. Validation against cosine similarity-based methods demonstrates the robustness of BTM, with strong agreement metrics and distinct advantages in handling outlier topics. A case study on climate news articles showcases BTM's utility, revealing significant thematic overlaps and distinctions between corpora focused on climate change and climate action. BTM's flexibility and precision make it a valuable tool for diverse applications, from political discourse analysis to interdisciplinary studies. By integrating shared and unique topic analyses, BTM offers a comprehensive framework for exploring thematic relationships, with potential extensions to multilingual and dynamic datasets. This work highlights BTM's methodological contributions and its capacity to advance discourse analysis across various domains.
Related papers
- Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation [0.0]
This study presents a framework for automated evaluation of dynamically evolving topic in scientific literature using Large Language Models (LLMs)
The proposed approach harnesses LLMs to measure key quality dimensions, such as coherence, repetitiveness, diversity, and topic-document alignment, without heavy reliance on expert annotators or narrow statistical metrics.
arXiv Detail & Related papers (2025-02-11T08:23:56Z) - Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning [23.816433328623397]
Overemphasizing likelihood without incorporating topic regularization can lead to an overly expansive latent space for topic modeling.
We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability.
Our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.
arXiv Detail & Related papers (2024-12-23T07:07:06Z) - BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks [2.9873893715462176]
This work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation.
By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation.
Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets.
arXiv Detail & Related papers (2024-07-05T06:25:34Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - MoME: Mixture of Multimodal Experts for Cancer Survival Prediction [46.520971457396726]
Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making.
Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separate encoding.
We propose a Biased Progressive Clever (BPE) paradigm, performing encoding and fusion simultaneously.
arXiv Detail & Related papers (2024-06-14T03:44:33Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Group Gated Fusion on Attention-based Bidirectional Alignment for
Multimodal Emotion Recognition [63.07844685982738]
This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states.
We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly.
The proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
arXiv Detail & Related papers (2022-01-17T09:46:59Z) - Transformer-based Multi-Aspect Modeling for Multi-Aspect Multi-Sentiment
Analysis [56.893393134328996]
We propose a novel Transformer-based Multi-aspect Modeling scheme (TMM), which can capture potential relations between multiple aspects and simultaneously detect the sentiment of all aspects in a sentence.
Our method achieves noticeable improvements compared with strong baselines such as BERT and RoBERTa.
arXiv Detail & Related papers (2020-11-01T11:06:31Z) - Modeling Topical Relevance for Multi-Turn Dialogue Generation [61.87165077442267]
We propose a new model, named STAR-BTM, to tackle the problem of topic drift in multi-turn dialogue.
The Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context.
Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-09-27T03:33:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.