Fugu-MT 論文翻訳(概要): RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles

論文の概要: RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles

arxiv url: http://arxiv.org/abs/2604.25304v1
Date: Tue, 28 Apr 2026 07:12:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.751322
Title: RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles
Title（参考訳）: RCProb:木組の効率的な単純化のための確率論的ルール抽出
Authors: Josue Obregon,
Abstract要約: 木アンサンブルは、強い予測性能と効率的な訓練手順のため、産業機械学習で広く使用されている。 1つのアプローチは、元のモデルの予測性能を維持しながら、ツリーアンサンブルから決定ルールを抽出することである。本稿では,ルール抽出の計算コストを削減するために設計されたルールCOSI+の確率的再構成であるRCProbを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tree ensembles are widely used in industrial machine learning due to their strong predictive performance and efficient training procedures. However, as the number of trees in an ensemble grows, the resulting models become increasingly difficult for humans to interpret. To address this limitation, explainable artificial intelligence (XAI) studies methods that generate interpretable models capable of explaining complex predictors. One approach consists of extracting decision rules from tree ensembles while attempting to preserve the predictive performance of the original model. In previous work, we introduced RuleCOSI+, a greedy heuristic algorithm for extracting compact rule-based models from tree ensembles. Although RuleCOSI+ produces accurate and interpretable rule sets, it relies on repeated empirical frequency counting over the training data to estimate rule confidence, which becomes computationally expensive for large datasets. In this paper, we propose RCProb, a probabilistic reformulation of RuleCOSI+ designed to reduce the computational cost of rule extraction. RCProb estimates rule statistics using Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods combined through a Naive Bayes formulation, avoiding repeated dataset scans. Experiments on 33 benchmark datasets show that RCProb maintains competitive predictive performance while reducing runtime by approximately $22\times$ compared with RuleCOSI+, while producing more compact rule sets on average.
Abstract（参考訳）: 木アンサンブルは、強い予測性能と効率的な訓練手順のため、産業機械学習で広く使用されている。しかし、アンサンブルの木の数が増加するにつれて、結果として生じるモデルは人間にとって解釈が困難になる。この制限に対処するために、説明可能な人工知能(XAI)は、複雑な予測子を説明することができる解釈可能なモデルを生成する方法を研究する。 1つのアプローチは、元のモデルの予測性能を維持しながら、ツリーアンサンブルから決定ルールを抽出することである。これまでの研究で,木アンサンブルからコンパクトなルールベースモデルを抽出するグリーディーなヒューリスティックアルゴリズムであるルールCOSI+を紹介した。 RuleCOSI+は正確かつ解釈可能なルールセットを生成するが、大規模なデータセットでは計算コストのかかるルール信頼度を推定するために、トレーニングデータに対する経験的頻度の繰り返しに依存する。本稿では,ルール抽出の計算コストを削減するために設計されたルールCOSI+の確率的再構成であるRCProbを提案する。 RCProbは、Dirichlet-smoothed class priorsとBeta-smoothed condition chancesをNaive Bayesの定式化によって組み合わせ、繰り返しのデータセットスキャンを避けることによって、ルール統計を推定する。 33のベンチマークデータセットの実験によると、RCProbは、平均でよりコンパクトなルールセットを生成しながら、ランタイムを約22\times$に削減しながら、競合予測性能を維持している。

論文の概要: RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles

関連論文リスト