Fugu-MT 論文翻訳(概要): Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

論文の概要: Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

arxiv url: http://arxiv.org/abs/2305.10696v1
Date: Thu, 18 May 2023 04:17:46 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-19 17:01:19.017309
Title: Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
Title（参考訳）: 無バイアスな特徴重要度を持つ無バイアス勾配昇降決定木
Authors: Zheyu Zhang, Tianping Zhang, Jian Li
Abstract要約: GBDT(Gradient Boosting Decision Tree)のスプリット探索アルゴリズムは、多くの潜在的な分割を持つ機能に対するバイアスとして批判されている。 GBDT における偏差の微粒化解析を行い,各分割の利得推定における系統的偏差が 1) に起因していることを示す。我々は,非バイアス利得(unbiased gain)について,非バイアス利得(unbiased gain)について,非バイアス利得(out-of-bag)サンプルを用いて検討した。
参考スコア（独自算出の注目度）: 6.700461065769045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods. The codes are available at https://github.com/ZheyuAqaZhang/UnbiasedGBM.
Abstract（参考訳）: Gradient Boosting Decision Tree (GBDT)は、様々なアプリケーションで大きな成功を収めています。木構築過程を決定する分割探索アルゴリズムはGBDTの最も重要な構成要素の1つである。しかし、分割探索アルゴリズムは、多くの潜在的な分割を持つ特徴に対する偏見として、長い間批判されてきた。このバイアスは、GBDTの厳しい解釈可能性と過剰適合の問題をもたらす。この目的のために,我々はgbdtにおけるバイアスのきめ細かな解析を行い,バイアスの起源を実証する。 1)各分割の利得推定における系統的バイアス 2) 分割探索アルゴリズムのバイアスは同一データを用いて分割改善を評価し, 最良の分割を決定する。そこで本研究では,バッグ外試料を用いた利得の非偏り測定手法であるunbiased gainを提案する。さらに, 分割探索アルゴリズムに非バイアス特性を組み込んで, GBDTの過適合問題を解決するためにUnbiasedGBMを開発した。 60個のデータセットからなる大規模実験研究において、UnbiasedGBMとunbiased gainの性能を評価し、以下の結果を示す。 1) UnbiasedGBMは、60データセットで平均して、LightGBM、XGBoost、Catboostのような人気のあるGBDT実装よりも優れたパフォーマンスを示している。 2)unbiased gainは,一般的な特徴重要手法よりも機能選択における平均性能が向上する。コードはhttps://github.com/ZheyuAqaZhang/UnbiasedGBMで入手できる。

関連論文リスト

Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark [53.876493664396506]
ベンチマークは、機械学習アルゴリズムのパフォーマンスの評価、比較の促進、優れたソリューションの特定に不可欠である。本稿では,関係抽出タスクにおけるエンティティバイアスの問題に対処する。本稿では,エンティティの代替によって,エンティティ参照と関係型との擬似相関を破る不偏関係抽出ベンチマークDREBを提案する。 DREBの新たなベースラインを確立するために,データレベルとモデルトレーニングレベルを組み合わせたデバイアス手法であるMixDebiasを導入する。
論文参考訳（メタデータ） (2025-01-02T17:01:06Z)
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
大規模言語モデル (LLM) は、学習データにおいて望ましくないバイアスを持続させる。本稿では,小さなバイアスとアンチバイアスのエキスパートモデルを利用してバイアスを緩和し,デバイアス信号を得る。性別、人種、宗教の偏見を緩和する実験は、いくつかの地域および世界的な偏見指標に偏見を減少させる。
論文参考訳（メタデータ） (2024-12-02T16:56:08Z)
Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems [74.47680026838128]
ユーザインタラクションデータとレコメンダシステム(RS)の2つの典型的なバイアスは、人気バイアスと肯定バイアスである。項目と評価値の双方に影響される多因子選択バイアスについて検討する。分散を低減し、最適化の堅牢性を向上させるため、スムースで交互に勾配降下する手法を提案する。
論文参考訳（メタデータ） (2024-04-29T12:18:21Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
統計的観点から「データセットバイアス」問題を考察する。問題の主な原因は、クラス属性 u と非クラス属性 b の強い相関関係である。本稿では,各試料nの目的をフラクタル1p(u_n|b_n)で重み付けするか,その試料をフラクタル1p(u_n|b_n)に比例してサンプリングすることにより,データセットバイアスを軽減することを提案する。
論文参考訳（メタデータ） (2024-02-05T22:58:06Z)
Causality and Independence Enhancement for Biased Node Classification [56.38828085943763]
各種グラフニューラルネットワーク(GNN)に適用可能な新しい因果性・独立性向上(CIE)フレームワークを提案する。提案手法は,ノード表現レベルでの因果的特徴と突発的特徴を推定し,突発的相関の影響を緩和する。我々のアプローチCIEは、GNNの性能を大幅に向上するだけでなく、最先端の debiased ノード分類法よりも優れています。
論文参考訳（メタデータ） (2023-10-14T13:56:24Z)
General Debiasing for Multimodal Sentiment Analysis [47.05329012210878]
本稿では,MSAモデルのOF(Out-Of-Distribution)一般化能力を高めることを目的とした,MSAタスクの一般化を提案する。我々はIPWを用いて、大きなバイアスのあるサンプルの効果を低減し、感情予測のための堅牢な特徴学習を容易にする。実験結果は,提案フレームワークのより優れた一般化能力を示すものである。
論文参考訳（メタデータ） (2023-07-20T00:36:41Z)
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases [27.56143777363971]
本稿では,複数のデータセットのバイアスを効果的かつ効率的に緩和できる分散混合適応器(SMOA)を提案する。自然言語推論およびパラフレーズ識別タスクの実験は、SMoAがフルファインタニング、アダプタチューニングベースライン、および以前の強いデバイアス法よりも優れていることを示した。
論文参考訳（メタデータ） (2023-02-28T08:47:20Z)
Feature-Level Debiased Natural Language Understanding [86.8751772146264]
既存の自然言語理解(NLU)モデルは、特定のデータセットで高いパフォーマンスを達成するために、データセットバイアスに依存することが多い。本稿では, バイアスの潜在特性を緩和し, バイアスの動的性質を無視するために, DCT(Debiasing contrastive learning)を提案する。 DCTは、ディストリビューション内のパフォーマンスを維持しながら、アウトオブディストリビューションデータセットの最先端のベースラインを上回ります。
論文参考訳（メタデータ） (2022-12-11T06:16:14Z)
Learning to Split for Automatic Bias Detection [39.353850990332525]
Learning to Split (ls)は自動バイアス検出のためのアルゴリズムである。我々は,Beer Review,CelebA,MNLIに対するアプローチを評価した。
論文参考訳（メタデータ） (2022-04-28T19:41:08Z)
General Greedy De-bias Learning [163.65789778416172]
本稿では,関数空間における勾配降下のような偏りのあるモデルとベースモデルを優雅に訓練する一般グリーディ・デバイアス学習フレームワーク(GGD)を提案する。 GGDは、事前知識を持つタスク固有バイアスモデルと、事前知識を持たない自己アンサンブルバイアスモデルの両方の設定の下で、より堅牢なベースモデルを学ぶことができる。
論文参考訳（メタデータ） (2021-12-20T14:47:32Z)
Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection [11.295032417617454]
偏りのある基礎学習者がグラディエント・ブースティング・マシン(GBM)の特徴的重要度(FI)測定に与える影響について検討した。クロスバリデード(CV)非バイアスベース学習者を利用して,この欠陥を比較的低い計算コストで修正する。提案手法を各種の合成・実世界の構成で実証し, 予測精度を比較的同等に保ちつつ, 全GBM FI尺度に有意な改善が見られた。
論文参考訳（メタデータ） (2021-09-12T09:32:43Z)
Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
VQA(Visual Question Answering)では、分布バイアスとショートカットバイアスという2つの側面から生じる言語バイアスを強調している。本稿では,非バイアスベースモデル学習に複数のバイアスモデルを組み合わせた新しいデバイアスフレームワークGreedy Gradient Ensemble(GGE)を提案する。 GGEはバイアス付きモデルを優先的にバイアス付きデータ分布に過度に適合させ、バイアス付きモデルでは解決が難しい例にベースモデルがより注意を払う。
論文参考訳（メタデータ） (2021-07-27T08:02:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。