Fugu-MT 論文翻訳(概要): Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

論文の概要: Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

arxiv url: http://arxiv.org/abs/2604.07562v1
Date: Wed, 08 Apr 2026 20:02:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.539607
Title: Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
Title（参考訳）: LLMによる教師なしテキストクラスタの推論によるリファインメント
Authors: Tunazzina Islam,
Abstract要約: 本稿では,大規模言語モデル(LLM)を組み込みジェネレータとしてではなく,推論に基づく改良フレームワークを提案する。本フレームワークでは, (i) コヒーレンス検証, (ii) 冗長性判断, (iii) ラベル接地という3つの推論段階を導入する。インタラクションモデルが異なる2つのプラットフォームから,実世界のソーシャルメディアコーパスの枠組みを評価する。
参考スコア（独自算出の注目度）: 8.06425428468097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised methods are widely used to induce latent semantic structure from large text collections, yet their outputs often contain incoherent, redundant, or poorly grounded clusters that are difficult to validate without labeled data. We propose a reasoning-based refinement framework that leverages large language models (LLMs) not as embedding generators, but as semantic judges that validate and restructure the outputs of arbitrary unsupervised clustering algorithms.Our framework introduces three reasoning stages: (i) coherence verification, where LLMs assess whether cluster summaries are supported by their member texts; (ii) redundancy adjudication, where candidate clusters are merged or rejected based on semantic overlap; and (iii) label grounding, where clusters are assigned interpretable labels in a fully unsupervised manner. This design decouples representation learning from structural validation and mitigates common failure modes of embedding-only approaches. We evaluate the framework on real-world social media corpora from two platforms with distinct interaction models, demonstrating consistent improvements in cluster coherence and human-aligned labeling quality over classical topic models and recent representation-based baselines. Human evaluation shows strong agreement with LLM-generated labels, despite the absence of gold-standard annotations. We further conduct robustness analyses under matched temporal and volume conditions to assess cross-platform stability. Beyond empirical gains, our results suggest that LLM-based reasoning can serve as a general mechanism for validating and refining unsupervised semantic structure, enabling more reliable and interpretable analyses of large text collections without supervision.
Abstract（参考訳）: 教師なしの手法は、大規模なテキストコレクションから潜在意味構造を誘導するために広く用いられているが、その出力には、ラベル付きデータなしでは検証が難しい不整合性、冗長性、あるいは根拠の低いクラスタが含まれることが多い。我々は,大規模言語モデル(LLM)を埋め込みジェネレータとしてではなく,任意のクラスタリングアルゴリズムの出力を検証・再構成する意味判断として活用する推論ベース改良フレームワークを提案する。 (i)コヒーレンス検証において、LCMは、そのメンバのテキストによってクラスタ要約がサポートされているかどうかを評価する。二意味的重複に基づいて、候補クラスタを合併又は拒絶する冗長性判断 (iii) クラスタが完全に教師なしの方法で解釈可能なラベルに割り当てられるラベルの接地。この設計は、構造的検証から表現学習を分離し、埋め込み専用アプローチの一般的な障害モードを緩和する。本研究では,従来のトピックモデルと最近の表現ベースラインに比較して,クラスタコヒーレンスとヒューマンアラインなラベリング品質が一貫した改善を示すとともに,インタラクションモデルが異なる2つのプラットフォームによる実世界のソーシャルメディアコーパスの枠組みを評価する。人間の評価は、ゴールドスタンダードアノテーションが欠如しているにもかかわらず、LLM生成ラベルと強い一致を示している。さらに, 時間的および体積的整合条件下でのロバストネス解析を行い, クロスプラットフォームの安定性を評価する。実験結果より, LLMに基づく推論は, 教師なし意味構造を検証し, 精査するための一般的なメカニズムとして機能し, 教師なしの大規模テキストコレクションの信頼性, 解釈可能な解析を可能にする可能性が示唆された。

論文の概要: Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

関連論文リスト