Fugu-MT 論文翻訳(概要): Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems

論文の概要: Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems

arxiv url: http://arxiv.org/abs/2112.13289v1
Date: Sat, 25 Dec 2021 21:22:32 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-29 05:54:41.244901
Title: Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems
Title（参考訳）: バイナリ分類システムの精度における有病率閾値と限界
Authors: Jacques Balayla
Abstract要約: 完全精度1に対して、正の精度閾値(phi_e$)は、精度-精度曲線における最大曲率の臨界点であることを示す。応用は多いが、ここで議論されている考え方は、計算複雑性理論、人工知能、医療スクリーニングなどで用いられる。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The accuracy of binary classification systems is defined as the proportion of correct predictions - both positive and negative - made by a classification model or computational algorithm. A value between 0 (no accuracy) and 1 (perfect accuracy), the accuracy of a classification model is dependent on several factors, notably: the classification rule or algorithm used, the intrinsic characteristics of the tool used to do the classification, and the relative frequency of the elements being classified. Several accuracy metrics exist, each with its own advantages in different classification scenarios. In this manuscript, we show that relative to a perfect accuracy of 1, the positive prevalence threshold ($\phi_e$), a critical point of maximum curvature in the precision-prevalence curve, bounds the $F{_{\beta}}$ score between 1 and 1.8/1.5/1.2 for $\beta$ values of 0.5/1.0/2.0, respectively; the $F_1$ score between 1 and 1.5, and the Fowlkes-Mallows Index (FM) between 1 and $\sqrt{2} \approx 1.414$. We likewise describe a novel $negative$ prevalence threshold ($\phi_n$), the level of sharpest curvature for the negative predictive value-prevalence curve, such that $\phi_n$ $>$ $\phi_e$. The area between both these thresholds bounds the Matthews Correlation Coefficient (MCC) between $\sqrt{2}/2$ and $\sqrt{2}$. Conversely, the ratio of the maximum possible accuracy to that at any point below the prevalence threshold, $\phi_e$, goes to infinity with decreasing prevalence. Though applications are numerous, the ideas herein discussed may be used in computational complexity theory, artificial intelligence, and medical screening, amongst others. Where computational time is a limiting resource, attaining the prevalence threshold in binary classification systems may be sufficient to yield levels of accuracy comparable to that under maximum prevalence.
Abstract（参考訳）: 二分分類システムの精度は、分類モデルまたは計算アルゴリズムによって作られた正と負の両方の正しい予測の比率として定義される。 0(精度なし)と1(完全精度)の間の値、分類モデルの精度は、特に、分類規則やアルゴリズム、分類に用いるツールの固有特性、分類される要素の相対周波数など、いくつかの要因に依存する。いくつかの精度指標が存在し、それぞれが異なる分類シナリオにおいて独自の利点がある。本書では,精度曲線における最大曲率臨界点である正の精度閾値(\phi_e$)に対して,F{_{\beta}}$スコアが 1 と 1.8/1.5/1.2 for $\beta$値が 0.5/1.0/2.0,F_1$スコアが 1 と 1.5,Fowlkes-Mallows Index (FM) が 1 と $\sqrt{2} \approx 1.414$,それぞれ有界であることを示す。同様に、$\phi_n$$>$$\phi_e$のように、負の予測値-値曲線の最も鋭い曲率のレベルである、新しい$ negative$prevalence threshold(\phi_n$)を記述します。これらの閾値の間の領域は、$\sqrt{2}/2$と$\sqrt{2}$の間のマシューズ相関係数(MCC)の境界である。逆に、有病率閾値以下の任意の点($\phi_e$)に対する最大可能な精度の比率は、有病率を下げて無限大になる。応用は多いが、ここで議論されている考え方は、計算複雑性理論、人工知能、医療スクリーニングなどで用いられる。計算時間が制限資源である場合、二項分類システムにおける有病率しきい値を達成することは、最大有病率で同等の精度を得るのに十分である。

関連論文リスト

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination [35.67742880001828]
平均シフト汚染を用いた高次元ロバスト平均推定のための計算効率のよい最初のアルゴリズムを提案する。提案アルゴリズムは, ほぼ最適サンプルの複雑性を持ち, サンプル・ポリノミカル時間で動作し, ターゲット平均を任意の精度で近似する。
論文参考訳（メタデータ） (2025-02-20T17:53:13Z)
Semiparametric conformal prediction [79.6147286161434]
リスクに敏感なアプリケーションは、複数の、潜在的に相関したターゲット変数に対して、よく校正された予測セットを必要とする。スコアをランダムなベクトルとして扱い、それらの連接関係構造を考慮した予測セットを構築することを目的とする。実世界のレグレッション問題に対して,所望のカバレッジと競争効率について報告する。
論文参考訳（メタデータ） (2024-11-04T14:29:02Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
我々は, 局所曲率をサンプルで探索し, 周辺面積を適応的に定義する適応型$k$-nearest(kK$-NN)アルゴリズムを提案する。多くの実世界のデータセットから、新しい$kK$-NNアルゴリズムは、確立された$k$-NN法と比較してバランスの取れた精度が優れていることが示されている。
論文参考訳（メタデータ） (2024-09-08T13:08:45Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
本稿では,閾値演算による予測値がS$変化の程度を測るマージン補数の概念を導入する。適切な因果仮定の下では、予測スコア$S$に対する$X$の影響は、真の結果$Y$に対する$X$の影響に等しいことを示す。
論文参考訳（メタデータ） (2024-05-24T11:22:19Z)
The Interplay between Distribution Parameters and the Accuracy-Robustness Tradeoff in Classification [0.0]
アドリラルトレーニングは、通常のモデルに比べて自然(未成熟)の例では正確でないモデルをもたらす傾向にある。これは、アルゴリズムの欠点か、トレーニングデータ分散の基本的な性質によるものとみなすことができる。本研究では,二進ガウス混合分類問題の下で後者のケースに焦点をあてる。
論文参考訳（メタデータ） (2021-07-01T06:57:50Z)
Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions [79.35722941720734]
多クラス分類問題に対する一般化線形モデルは、現代の機械学習タスクの基本的な構成要素の1つである。実験的リスク最小化による高次元推定器の精度を実証する。合成データの範囲を超えて我々の理論をどのように適用できるかを論じる。
論文参考訳（メタデータ） (2021-06-07T16:53:56Z)
Accuracy and precision of the estimation of the number of missing levels in chaotic spectra using long-range correlations [0.0]
量子カオススペクトルにおける観測値の分数$varphi$を長距離相関によって推定する精度と精度について検討した。我々はガウス直交アンサンブル行列の対角化から得られるスペクトルのモンテカルロシミュレーションを用いて、式に合わせるために無作為なレベルをランダムに取り出す。この精度は一般に$delta_n$のパワースペクトルを用いた推定において、$Delta_3$統計を用いた推定よりも優れている。
論文参考訳（メタデータ） (2020-11-03T12:42:07Z)
Developing and Improving Risk Models using Machine-learning Based Algorithms [6.245537312562826]
本研究の目的は,ビジネスの欠陥を分類する優れたリスクモデルを開発することである。この解析の理論的根拠は、まず正則化により良質な基底二項分類器を得ることである。優れたベース分類器上で、バッジやブーストを含む2つのモデルアンサンブルアルゴリズムを実行し、さらなるモデル改善を行う。
論文参考訳（メタデータ） (2020-09-09T20:38:00Z)
Consistent Structured Prediction with Max-Min Margin Markov Networks [84.60515484036239]
二項分類のためのマックスマージン法は、最大マージンマルコフネットワーク(M3N$)の名前で構造化予測設定まで拡張されている。我々は、学習問題を"max-min"マージンの定式化で定義し、結果のメソッドmax-minマージンマルコフネットワーク(M4N$)を命名することで、そのような制限を克服する。マルチクラス分類,順序回帰,シーケンス予測,ランキング実験により,提案手法の有効性が示された。
論文参考訳（メタデータ） (2020-07-02T10:48:42Z)
Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification [54.22421582955454]
逆向きに頑健な分類の過剰リスクに対する最適ミニマックス保証の最初の結果を提供する。結果はAdvSNR(Adversarial Signal-to-Noise Ratio)の項で述べられており、これは標準的な線形分類と逆数設定との類似の考え方を一般化している。
論文参考訳（メタデータ） (2020-06-29T21:06:52Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
本稿では,分離可能なデータの強化に関する高精度な高次元理論を確立する。統計モデルのクラスでは、ブースティングの普遍性誤差を正確に解析する。また, 推力試験誤差と最適ベイズ誤差の関係を明示的に説明する。
論文参考訳（メタデータ） (2020-02-05T00:24:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。