Fugu-MT 論文翻訳(概要): Regularized impurity reduction: Accurate decision trees with complexity guarantees

論文の概要: Regularized impurity reduction: Accurate decision trees with complexity guarantees

arxiv url: http://arxiv.org/abs/2208.10949v1
Date: Tue, 23 Aug 2022 13:15:19 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-24 14:00:36.910181
Title: Regularized impurity reduction: Accurate decision trees with complexity guarantees
Title（参考訳）: 正規化不純物低減:複雑さを保証する正確な決定木
Authors: Guangyi Zhang and Aristides Gionis
Abstract要約: 本稿では,木複雑性の対数近似を保証する木推論アルゴリズムを提案する。改良されたアルゴリズムは予測精度と木の複雑さのバランスが良好である。
参考スコア（独自算出の注目度）: 20.170305081348328
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and splitting tests are associated with non-uniform costs. As a measure of tree complexity, we adopt the expected cost to classify an object drawn from the input distribution, which, in the uniform-cost case, is the expected number of tests. We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. This approximation factor is tight up to a constant factor under mild assumptions. The algorithm recursively selects a test that maximizes a greedy criterion defined as a weighted sum of three components. The first two components encourage the selection of tests that improve the balance and the cost-efficiency of the tree, respectively, while the third impurity-reduction component encourages the selection of more discriminative tests. As shown in our empirical evaluation, compared to the original heuristics, the enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.
Abstract（参考訳）: 決定木は一般的な分類モデルであり、高精度で直感的な説明を提供する。しかし、木の大きさが大きくなるとモデル解釈性が低下する。 C4.5やCARTのような伝統的な木誘導アルゴリズムは、各分割の識別力を促進する不純物還元関数に依存している。したがって、これらの手法は実際は正確であるが、小木を生産するという理論的な保証はない。本稿では,小さな木が望ましい場合に,エントロピーやジニインデックスなどの一般的な機能を含む不純物関数の一般ファミリーの使用を,単純な拡張によって複雑さの保証を付与できることを示すことにより正当化する。分類対象が任意の確率分布から引き出され、分類はバイナリまたはマルチクラスとなり、分割テストは一様でないコストと関連付けられる一般的な設定を考える。木の複雑さの尺度として、入力分布から引き出されたオブジェクトを分類するための期待コストを採用し、一様コストの場合、期待されるテスト数である。本稿では,木複雑性の対数近似を保証する木推論アルゴリズムを提案する。この近似因子は、穏やかな仮定の下で一定の因子に密接である。アルゴリズムは、3つの成分の重み付け和として定義される欲求基準を最大化するテストを再帰的に選択する。最初の2つのコンポーネントは、ツリーのバランスとコスト効率を改善するテストの選択を奨励し、第3の不純物還元コンポーネントは、より識別的なテストの選択を奨励する。我々の経験的評価に示すように、元のヒューリスティックスと比較して、拡張アルゴリズムは予測精度と木の複雑さのバランスが良い。

関連論文リスト

CART-ELC: Oblique Decision Tree Induction via Exhaustive Search [0.0]
斜めの分割を見つけるために徹底的な探索に依存する手法は、計算上の課題に直面している。斜め決定木を誘導する新しいアルゴリズム, 分類・回帰木-排他線形結合(CART-ELC)を導入する。以上の結果から,CART-ELCは小データセット上での競合性能を一貫して達成できることが示唆された。
論文参考訳（メタデータ） (2025-05-08T16:42:13Z)
Separation and Collapse of Equilibria Inequalities on AND-OR Trees without Shape Constraints [0.0]
本稿では,AND-OR木計算の最悪の入力に対して最小のコストであるゼロエラーランダム化複雑性について検討する。指向性アルゴリズムは、ランダム化された複雑さを達成することが知られている。任意のAND-OR木に対して、ランダム化深度優先アルゴリズムは、方向性アルゴリズムと同じ平衡を持つことを示す。
論文参考訳（メタデータ） (2024-05-30T15:13:46Z)
Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
メタ学習によってトレーニングされたトランスフォーマーベースのモデルであるMetaTreeを導入し、強力な決定木を直接生成する。我々は、多くのデータセットに欲求決定木とグローバルに最適化された決定木の両方を適合させ、MetaTreeを訓練して、強力な一般化性能を実現する木のみを生成する。
論文参考訳（メタデータ） (2024-02-06T07:40:53Z)
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers [68.76846801719095]
統計学で広く普及している偏りと分散還元に対する現在の高次二分法は、木のアンサンブルを理解するには不十分である、と我々は主張する。森林は、通常暗黙的に絡み合っている3つの異なるメカニズムによって、樹木を改良できることを示す。
論文参考訳（メタデータ） (2024-02-02T15:36:43Z)
Multi-rules mining algorithm for combinatorially exploded decision trees with modified Aitchison-Aitken function-based Bayesian optimization [0.0]
推定性能の高い木を戦略的に構築する"MAABO-MT"と"GS-MRM"アルゴリズム。提案手法の有効性を解析するために,複数のオープンデータセットを用いて実験を行った。
論文参考訳（メタデータ） (2023-10-04T07:55:51Z)
Permutation Decision Trees [3.089408984959925]
Effort-To-Compress (ETC) は、新しい不純物尺度として初めて複雑性尺度である。置換決定木と様々な実世界のデータセットにおける古典的決定木の性能比較を行う。
論文参考訳（メタデータ） (2023-06-05T06:31:14Z)
Optimal randomized classification trees [0.0]
分類と回帰木(英: Classification and Regression Trees、CART)は、現代の統計学と機械学習における既成の技術である。 CARTはgreedyプロシージャによって構築され、分割予測変数と関連するしきい値を逐次決定する。この強欲なアプローチは、木を非常に高速に木に分類するが、その性質上、それらの分類精度は他の最先端の手順と競合しないかもしれない。
論文参考訳（メタデータ） (2021-10-19T11:41:12Z)
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds [9.546094657606178]
本研究では,異なる回帰モデルに対する決定木の一般化性能について検討する。これにより、アルゴリズムが新しいデータに一般化するために(あるいは作らない)仮定する帰納的バイアスが引き起こされる。スパース加法モデルに適合する大規模な決定木アルゴリズムに対して、シャープな2乗誤差一般化を低い境界で証明する。
論文参考訳（メタデータ） (2021-10-18T21:22:40Z)
Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
Recursive Grouping (RG) と Chow-Liu Recursive Grouping (CLRG) のサンプル複雑性について述べる。 RG,CLRG,Neighbor Joining (NJ) およびSpectral NJ (SNJ) をトラッピングした内積を用いて強化する。我々は、潜在木の構造学習において、最初の既知のインスタンス依存の不合理性の結果を導出する。
論文参考訳（メタデータ） (2021-06-02T01:37:52Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
ノードからの観測が独立しているが非識別的に分散ノイズによって破損した場合、Ising Treeモデルの学習を検討する。 Katiyarら。 (2020) は, 正確な木構造は復元できないが, 部分木構造を復元できることを示した。統計的に堅牢な部分木回復アルゴリズムであるSymmetrized Geometric Averaging(SGA)を提案する。
論文参考訳（メタデータ） (2021-01-22T01:57:35Z)
Convex Polytope Trees [57.56078843831244]
コンベックスポリトープ木(CPT)は、決定境界の解釈可能な一般化によって決定木の系統を拡張するために提案される。木構造が与えられたとき,木パラメータに対するCPTおよび拡張性のあるエンドツーエンドトレーニングアルゴリズムを効率的に構築する。
論文参考訳（メタデータ） (2020-10-21T19:38:57Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
動的プログラミングと探索に基づいて最適な分類木を学習するための新しいアルゴリズムを提案する。当社のアプローチでは,最先端技術が必要とする時間のごく一部しか使用せず,数万のインスタンスでデータセットを処理することが可能です。
論文参考訳（メタデータ） (2020-07-24T17:06:55Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。