GloCTM: Cross-Lingual Topic Modeling via a Global Context Space
- URL: http://arxiv.org/abs/2601.11872v1
- Date: Sat, 17 Jan 2026 01:45:31 GMT
- Title: GloCTM: Cross-Lingual Topic Modeling via a Global Context Space
- Authors: Nguyen Tien Phat, Ngo Vu Minh, Linh Van Ngo, Nguyen Thi Ngoc Diep, Thien Huu Nguyen,
- Abstract summary: GloCTM is a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline.<n>At the output level, the global topic-word distribution, defined over the combined vocabulary, structurally synchronizes topic meanings across languages.
- Score: 28.89996742581612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-lingual topic modeling seeks to uncover coherent and semantically aligned topics across languages - a task central to multilingual understanding. Yet most existing models learn topics in disjoint, language-specific spaces and rely on alignment mechanisms (e.g., bilingual dictionaries) that often fail to capture deep cross-lingual semantics, resulting in loosely connected topic spaces. Moreover, these approaches often overlook the rich semantic signals embedded in multilingual pretrained representations, further limiting their ability to capture fine-grained alignment. We introduce GloCTM (Global Context Space for Cross-Lingual Topic Model), a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline. GloCTM constructs enriched input representations by expanding bag-of-words with cross-lingual lexical neighborhoods, and infers topic proportions using both local and global encoders, with their latent representations aligned through internal regularization. At the output level, the global topic-word distribution, defined over the combined vocabulary, structurally synchronizes topic meanings across languages. To further ground topics in deep semantic space, GloCTM incorporates a Centered Kernel Alignment (CKA) loss that aligns the latent topic space with multilingual contextual embeddings. Experiments across multiple benchmarks demonstrate that GloCTM significantly improves topic coherence and cross-lingual alignment, outperforming strong baselines.
Related papers
- Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning [81.43257201833154]
We propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities.<n>Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirectional prediction of masked image and text.<n>The proposed method achieves new state-of-the-art results on all multilingual TIPR datasets.
arXiv Detail & Related papers (2025-10-20T16:01:11Z) - XTRA: Cross-Lingual Topic Modeling with Topic and Representation Alignments [16.831512837465123]
Cross-lingual topic modeling aims to uncover shared semantic themes across languages.<n>We propose XTRA, a framework that unifies Bag-of-Words modeling with multilingual embeddings.<n>XTRA learns topics that are interpretable (coherent and diverse) and well-aligned across languages.
arXiv Detail & Related papers (2025-10-03T07:46:23Z) - High-Dimensional Interlingual Representations of Large Language Models [65.77317753001954]
Large language models (LLMs) trained on massive multilingual datasets hint at the formation of interlingual constructs.<n>We explore 31 diverse languages varying on their resource-levels, typologies, and geographical regions.<n>We find that multilingual LLMs exhibit inconsistent cross-lingual alignments.
arXiv Detail & Related papers (2025-03-14T10:39:27Z) - Exploring Alignment in Shared Cross-lingual Spaces [15.98134426166435]
We employ clustering to uncover latent concepts within multilingual models.
Our analysis focuses on quantifying the textitalignment and textitoverlap of these concepts across various languages.
Our study encompasses three multilingual models (textttmT5, texttmBERT, and textttXLM-R) and three downstream tasks (Machine Translation, Named Entity Recognition, and Sentiment Analysis)
arXiv Detail & Related papers (2024-05-23T13:20:24Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling [40.54497836775837]
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics.
Most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries.
We propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM) to produce more coherent, diverse, and well-aligned topics.
arXiv Detail & Related papers (2023-04-07T08:49:43Z) - Enhancing Dialogue Summarization with Topic-Aware Global- and Local-
Level Centrality [24.838387172698543]
We propose a novel topic-aware Global-Local Centrality (GLC) model to help select the salient context from all sub-topics.
The global one aims to identify vital sub-topics in the dialogue and the local one aims to select the most important context in each sub-topic.
Experimental results show that our model outperforms strong baselines on three public dialogue summarization datasets.
arXiv Detail & Related papers (2023-01-29T06:41:55Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual
Spoken Language Understanding [74.39024160277809]
We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming.
Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance.
GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.
arXiv Detail & Related papers (2022-04-18T13:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.