Fugu-MT 論文翻訳(概要): Enhancing Clustering: An Explainable Approach via Filtered Patterns

論文の概要: Enhancing Clustering: An Explainable Approach via Filtered Patterns

arxiv url: http://arxiv.org/abs/2604.12460v1
Date: Tue, 14 Apr 2026 08:45:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.349026
Title: Enhancing Clustering: An Explainable Approach via Filtered Patterns
Title（参考訳）: クラスタリングの強化: フィルタパターンによる説明可能なアプローチ
Authors: Motaz Ben Hassine, Saïd Jabbour,
Abstract要約: 説明可能なクラスタリングは、データを$$disjointクラスタに分割する、知識駆動型の教師なし学習パラダイムである。同一頻出パターン(k緩和パターン)の導入によるクラスタリング品質の向上に関する最近の研究複数の異なる k-被覆が同一の k-被覆を誘導し、冗長な記号表現をもたらす。そこで本研究では,個々のk被覆に対して1つの代表パターンを保持することで,冗長パターンを除去する最適化手法を提案する。
参考スコア（独自算出の注目度）: 4.576379639081977
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning has become a central research area, with increasing attention devoted to explainable clustering, also known as conceptual clustering, which is a knowledge-driven unsupervised learning paradigm that partitions data into $θ$ disjoint clusters, where each cluster is described by an explicit symbolic representation, typically expressed as a closed pattern or itemset. By providing human-interpretable cluster descriptions, explainable clustering plays an important role in explainable artificial intelligence and knowledge discovery. Recent work improved clustering quality by introducing k-relaxed frequent patterns (k-RFPs), a pattern model that relaxes strict coverage constraints through a generalized kcover definition. This framework integrates constraint-based reasoning, using SAT solvers for pattern generation, with combinatorial optimization, using Integer Linear Programming (ILP) for cluster selection. Despite its effectiveness, this approach suffers from a critical limitation: multiple distinct k-RFPs may induce identical k-covers, leading to redundant symbolic representations that unnecessarily enlarge the search space and increase computational complexity during cluster construction. In this paper, we address this redundancy through a pattern reduction framework. Our contributions are threefold. First, we formally characterize the conditions under which distinct k-RFPs induce identical kcovers, providing theoretical foundations for redundancy detection. Second, we propose an optimization strategy that removes redundant patterns by retaining a single representative pattern for each distinct k-cover. Third, we investigate the interpretability and representativeness of the patterns selected by the ILP model by analyzing their robustness with respect to their induced clusters. Extensive experiments conducted on several real-world datasets demonstrate that the proposed approach significantly reduces the pattern search space, improves computational efficiency, preserves and enhances in some cases the quality of the resulting clusters.
Abstract（参考訳）: 機械学習が中心的な研究領域となり、説明可能なクラスタリング(概念クラスタリングとも呼ばれる)に注目が集まっている。これは知識駆動型非教師なし学習パラダイムで、データを$θ$非結合クラスタに分割し、各クラスタは明示的な象徴的表現によって記述され、通常、閉じたパターンやアイテムセットとして表現される。人間の解釈可能なクラスタ記述を提供することによって、説明可能なクラスタリングは、説明可能な人工知能と知識発見において重要な役割を果たす。近年、k-relaxed frequent pattern (k-RFPs)を導入してクラスタリングの品質を改善している。このフレームワークは制約ベースの推論を統合し、SATソルバを用いてパターン生成を行い、組合せ最適化を行い、Integer Linear Programming (ILP) を用いてクラスタ選択を行う。複数の異なる k-RFP が同一の k-被覆を誘導し、余分な記号表現を発生させ、探索空間を不必要に拡大し、クラスタ構築中に計算複雑性を増大させる。本稿では,パターン還元フレームワークを用いて,この冗長性に対処する。私たちの貢献は3倍です。まず,異なるk-RFPが同一の検体を誘導する条件を正式に特徴付け,冗長性検出の理論的基礎を提供する。第2に、異なるk被覆毎に1つの代表パターンを保持することで、冗長パターンを除去する最適化手法を提案する。第3に、ICPモデルにより選択されたパターンの解釈可能性と代表性について、それらのクラスタに対するロバスト性を分析することによって検討する。いくつかの実世界のデータセットで実施された大規模な実験により、提案手法はパターン探索空間を著しく削減し、計算効率を向上し、いくつかの場合において、結果のクラスタの品質を保ち、強化することを示した。

論文の概要: Enhancing Clustering: An Explainable Approach via Filtered Patterns

関連論文リスト