Fugu-MT 論文翻訳(概要): How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

論文の概要: How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

arxiv url: http://arxiv.org/abs/2606.03549v1
Date: Tue, 02 Jun 2026 12:10:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.985996
Title: How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
Title（参考訳）: ランダム林の樹木数 : プラトー探索とオプチュナ統合による再検討
Authors: Vadim Porvatov, Andrey Dukhovny, Andrey Lange,
Abstract要約: ランダムフォレストのためのHPOのための統合三重項ベースプラトー探索アルゴリズムを提案する。この方法は、バッグ外スコアの相対的な変化を監視して、最小に近い十分なアンサンブルサイズを適応的に追跡する。実験により、選択された木の数は共通の絶対値と大きく異なることが示されている。
参考スコア（独自算出の注目度）: 0.30586855806896046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at https://github.com/lange-am/rf_plateau_hpo.
Abstract（参考訳）: 予測スコアは通常、アンサンブルサイズで単調に改善するので、木構造パーゼン推定器(TPE)やハイパーバンドのような標準的な手法では、事前に定義された探索範囲を必要とし、しばしばその正しい境界に向かって推定を駆動する。早期停止戦略はそのような範囲の修正は避けるが、ノイズのスコアや早期停止の傾向に敏感である。そこで本研究では, 直接TPE探索空間から木数を取り除き, HPO 試験で蓄積した情報を活用する三重項に基づくプラトー探索アルゴリズムを提案する。本手法は,森林面積の3倍にまたがるアウト・オブ・バッグ(OOB)スコアの相対的変化を監視し,それに応じて3倍に変化させることにより,最小の十分なアンサンブルサイズを適応的に追跡する。これにより、寛容パラメータに基づいた自動化およびユーザ解釈可能なプロシージャが得られる。また,提案したOOBスコア基準を電流と制限スコアのギャップに関連付け,対応するOOBベースの絶対相対差に対する漸近的分散推定を導出する。実験の結果、選択された木の数は一般的なヒューリスティックと大きく異なることが示され、ほとんどの古典的なベンチマークデータセットでは小さいが、ArceneやDorotheaのような高次元のバイオインフォマティクスデータセットでは大きい。ソースコードと再現可能な実験はhttps://github.com/lange-am/rf_plateau_hpo.comで公開されている。

論文の概要: How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

関連論文リスト