Fugu-MT 論文翻訳(概要): A replica analysis of under-bagging

論文の概要: A replica analysis of under-bagging

arxiv url: http://arxiv.org/abs/2404.09779v2
Date: Thu, 25 Apr 2024 09:39:16 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-26 20:28:54.236318
Title: A replica analysis of under-bagging
Title（参考訳）: アンダーバッグングのレプリカ解析
Authors: Takashi Takahashi,
Abstract要約: Under-bagging (UB) は、不均衡なデータに基づいて分類器を訓練するための一般的なアンサンブル学習手法である。少数派のサイズを一定に保ちながら多数派を拡大することにより,UBの性能が向上することを示した。これは、多数派が大きくなるにつれて性能が変化しない米国と、不均衡が増加するにつれて性能が低下するSWとは対照的である。
参考スコア（独自算出の注目度）: 3.1274367448459253
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Under-bagging (UB), which combines under sampling and bagging, is a popular ensemble learning method for training classifiers on an imbalanced data. Using bagging to reduce the increased variance caused by the reduction in sample size due to under sampling is a natural approach. However, it has recently been pointed out that in generalized linear models, naive bagging, which does not consider the class imbalance structure, and ridge regularization can produce the same results. Therefore, it is not obvious whether it is better to use UB, which requires an increased computational cost proportional to the number of under-sampled data sets, when training linear models. Given such a situation, in this study, we heuristically derive a sharp asymptotics of UB and use it to compare with several other standard methods for learning from imbalanced data, in the scenario where a linear classifier is trained from a two-component mixture data. The methods compared include the under-sampling (US) method, which trains a model using a single realization of the subsampled data, and the simple weighting (SW) method, which trains a model with a weighted loss on the entire data. It is shown that the performance of UB is improved by increasing the size of the majority class while keeping the size of the minority fixed, even though the class imbalance can be large, especially when the size of the minority class is small. This is in contrast to US, whose performance does not change as the size of the majority class increases, and SW, whose performance decreases as the imbalance increases. These results are different from the case of the naive bagging when training generalized linear models without considering the structure of the class imbalance, indicating the intrinsic difference between the ensembling and the direct regularization on the parameters.
Abstract（参考訳）: アンダーバッグング(Under-bagging, UB)は, サンプリングとバッグングを併用したアンサンブル学習法である。サンプリング中の試料サイズの減少に起因する分散の増大をバッグングを用いて低減することは自然なアプローチである。しかし近年、一般化線形モデルでは、クラス不均衡構造を考慮しない単純バッグングとリッジ正規化が同じ結果をもたらすことが指摘されている。したがって、線形モデルのトレーニングにおいて、アンダーサンプルデータセットの数に比例する計算コストの増大を必要とするUBを使う方がよいかどうかは明らかではない。このような状況を踏まえ、本研究ではUBの急激な漸近をヒューリスティックに導き、二成分混合データから線形分類器を訓練するシナリオにおいて、不均衡データから学習する他の標準手法と比較する。比較した手法には、サブサンプルデータの単一実現を用いてモデルをトレーニングするアンダーサンプリング(US)法と、全データに重み付き損失を持つモデルをトレーニングする単純な重み付け(SW)法が含まれる。特に少数クラスのサイズが小さい場合において、クラス不均衡が大きい場合であっても、少数クラスのサイズを維持しながら、多数クラスのサイズを増大させることにより、UBの性能が向上することが示されている。これは、多数派が大きくなるにつれて性能が変化しない米国と、不均衡が増加するにつれて性能が低下するSWとは対照的である。これらの結果は,クラス不均衡の構造を考慮せずに一般線形モデルのトレーニングを行う場合と異なる。

関連論文リスト

On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning [18.318758111829386]
非パラメトリックなインスタンス識別に基づく効率的なシングルブランチSSL手法を提案する。また,確率分布と正方形根版とのKL分散を最小限に抑える新しい自己蒸留損失を提案する。
論文参考訳（メタデータ） (2024-04-30T06:39:04Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
細長い分布は、少数の少数派が限られた数のサンプルを含む実世界のデータにしばしば現れる。近年の研究では、教師付きコントラスト学習がデータ不均衡を緩和する有望な可能性を示していることが明らかになっている。本稿では,特徴空間の各クラスからのサンプルデータ分布を推定する確率論的コントラスト学習アルゴリズムを提案する。
論文参考訳（メタデータ） (2024-03-11T13:44:49Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
我々は,クラスごとのサンプル数に関する事前知識を必要とせず,シンプルなロジットアプローチ(LORT)を開発した。提案手法は,CIFAR100-LT, ImageNet-LT, iNaturalist 2018など,様々な不均衡データセットの最先端性能を実現する。
論文参考訳（メタデータ） (2024-03-01T03:27:08Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
異なるスコアに基づく因果探索法は観測データから有向非巡回グラフを学習する。本稿では,Reweighted Score関数ReScoreの適応重みを動的に学習することにより因果発見性能を向上させるためのモデルに依存しないフレームワークを提案する。
論文参考訳（メタデータ） (2023-03-06T14:49:59Z)
Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
本稿では,多数派の指導のもと,新たなマイノリティサンプルを生成する,Majority-Guided VAE(MGVAE)と呼ばれる新しいオーバーサンプリングモデルを提案する。このようにして、新しく生成されたマイノリティサンプルは、大多数のサンプルの多様性と豊かさを継承することができ、下流タスクにおける過度な適合を軽減できる。
論文参考訳（メタデータ） (2023-02-14T06:35:23Z)
Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification [74.62203971625173]
不均衡データは、ディープラーニングに基づく分類モデルに課題をもたらす。不均衡なデータを扱うための最も広く使われているアプローチの1つは、再重み付けである。本稿では,分布の観点からの最適輸送(OT)に基づく新しい再重み付け手法を提案する。
論文参考訳（メタデータ） (2022-08-05T01:23:54Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
深層学習におけるデータ不均衡やラベルノイズ問題に対処するための証明可能な手法(ABSGD)を提案する。本手法は運動量SGDの簡易な修正であり,各試料に個別の重み付けを行う。 ABSGDは追加コストなしで他の堅牢な損失と組み合わせられるほど柔軟である。
論文参考訳（メタデータ） (2020-12-13T03:41:52Z)
Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
判別分析フレームワーク内での大きなサンプルサイズに起因する計算問題を考察する。線形および二次判別分析のためのトレーニングサンプル数を削減するための新しい圧縮手法を提案する。
論文参考訳（メタデータ） (2020-05-08T05:09:08Z)
Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
オーバーサンプリングによる非バランスな分類タスクの性能向上を図るために, 深層生成モデルを用いて現実的なサンプルを提供する能力について検討した。実験の結果,サンプリング手法は品質に影響を与えないが,実行環境は様々であることがわかった。また、性能指標の点でも改善が重要であるが、絶対的な点では小さな点がしばしば見られる。
論文参考訳（メタデータ） (2020-05-07T21:35:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。