Fugu-MT 論文翻訳(概要): GloCTM: Cross-Lingual Topic Modeling via a Global Context Space

論文の概要: GloCTM: Cross-Lingual Topic Modeling via a Global Context Space

arxiv url: http://arxiv.org/abs/2601.11872v1
Date: Sat, 17 Jan 2026 01:45:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:22.350727
Title: GloCTM: Cross-Lingual Topic Modeling via a Global Context Space
Title（参考訳）: GloCTM:グローバルコンテキスト空間による言語間トピックモデリング
Authors: Nguyen Tien Phat, Ngo Vu Minh, Linh Van Ngo, Nguyen Thi Ngoc Diep, Thien Huu Nguyen,
Abstract要約: GloCTMは、モデルパイプライン全体にわたる統一的なセマンティック空間を通じて、言語間トピックアライメントを強制する、新しいフレームワークである。出力レベルでは、複合語彙上で定義されたグローバルなトピックワード分布は、言語間でトピックの意味を構造的に同期させる。
参考スコア（独自算出の注目度）: 28.89996742581612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-lingual topic modeling seeks to uncover coherent and semantically aligned topics across languages - a task central to multilingual understanding. Yet most existing models learn topics in disjoint, language-specific spaces and rely on alignment mechanisms (e.g., bilingual dictionaries) that often fail to capture deep cross-lingual semantics, resulting in loosely connected topic spaces. Moreover, these approaches often overlook the rich semantic signals embedded in multilingual pretrained representations, further limiting their ability to capture fine-grained alignment. We introduce GloCTM (Global Context Space for Cross-Lingual Topic Model), a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline. GloCTM constructs enriched input representations by expanding bag-of-words with cross-lingual lexical neighborhoods, and infers topic proportions using both local and global encoders, with their latent representations aligned through internal regularization. At the output level, the global topic-word distribution, defined over the combined vocabulary, structurally synchronizes topic meanings across languages. To further ground topics in deep semantic space, GloCTM incorporates a Centered Kernel Alignment (CKA) loss that aligns the latent topic space with multilingual contextual embeddings. Experiments across multiple benchmarks demonstrate that GloCTM significantly improves topic coherence and cross-lingual alignment, outperforming strong baselines.
Abstract（参考訳）: 言語間のトピックモデリングは、言語間の一貫性とセマンティックに整合したトピックを明らかにすることを目指している。しかし、既存のほとんどのモデルは、解離した言語固有の空間でトピックを学び、しばしば深い言語間セマンティクスを捉えるのに失敗するアライメント機構(例えば、バイリンガル辞書)に依存し、ゆるやかに連結されたトピック空間をもたらす。さらに、これらのアプローチは多言語で事前訓練された表現に埋め込まれたリッチなセマンティックな信号を見落とし、さらに細かなアライメントを捉える能力を制限している。本稿では,GloCTM(Global Context Space for Cross-Lingual Topic Model)を紹介する。 GloCTMは、単語のバグ・オブ・ワードを言語横断の語彙近傍に拡張してリッチな入力表現を構築し、局所エンコーダとグローバルエンコーダの両方を用いてトピック比を推論し、その潜在表現は内部正規化によって整列する。出力レベルでは、複合語彙上で定義されたグローバルなトピックワード分布は、言語間でトピックの意味を構造的に同期させる。ディープセマンティック空間におけるトピックのさらなる基盤となるため、GloCTMはCKA(Centered Kernel Alignment)損失を組み込んで、潜在トピック空間とマルチリンガルなコンテキスト埋め込みを整合させる。複数のベンチマーク実験により、GloCTMはトピックコヒーレンスと言語間アライメントを大幅に改善し、強いベースラインを上回ります。

論文の概要: GloCTM: Cross-Lingual Topic Modeling via a Global Context Space

関連論文リスト