Fugu-MT 論文翻訳(概要): Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

論文の概要: Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

arxiv url: http://arxiv.org/abs/2005.05117v2
Date: Tue, 12 May 2020 10:46:33 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-04 19:52:36.948873
Title: Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
Title（参考訳）: 不完全な情報に基づく最寄りの隣接分類器:ある回答から特定の予測へ
Authors: Bojan Karla\v{s}, Peng Li, Renzhi Wu, Nezihe Merve G\"urel, Xu Chu, Wentao Wu, Ce Zhang
Abstract要約: 不整合性と不完全情報は、現実世界のデータセットでユビキタスである。我々は「確実な予測(CP)」の概念を提案する。 CPCleanは、手作業による手作業による分類精度において、既存の技術よりも大幅に優れている。
参考スコア（独自算出の注目度）: 25.249892204626583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications remains elusive. In this paper, we present a formal study of this impact by extending the notion of Certain Answers for Codd tables, which has been explored by the database research community for decades, into the field of machine learning. Specifically, we focus on classification problems and propose the notion of "Certain Predictions" (CP) -- a test data example can be certainly predicted (CP'ed) if all possible classifiers trained on top of all possible worlds induced by the incompleteness of data would yield the same prediction. We study two fundamental CP queries: (Q1) checking query that determines whether a data example can be CP'ed; and (Q2) counting query that computes the number of classifiers that support a particular prediction (i.e., label). Given that general solutions to CP queries are, not surprisingly, hard without assumption over the type of classifier, we further present a case study in the context of nearest neighbor (NN) classifiers, where efficient solutions to CP queries can be developed -- we show that it is possible to answer both queries in linear or polynomial time over exponentially many possible worlds. We demonstrate one example use case of CP in the important application of "data cleaning for machine learning (DC for ML)." We show that our proposed CPClean approach built based on CP can often significantly outperform existing techniques in terms of classification accuracy with mild manual cleaning effort.
Abstract（参考訳）: 機械学習(ML)アプリケーションは近年成長しており、その主な原因はデータの可用性の向上にある。しかし、非一貫性と不完全な情報は、実世界のデータセットにはどこにでもある。本稿では,データベース研究コミュニティが何十年にもわたって研究してきたcoddテーブルに対して,ある種の回答の概念を機械学習の分野に拡張して,その影響に関する形式的な研究を行う。 Specifically, we focus on classification problems and propose the notion of "Certain Predictions" (CP) -- a test data example can be certainly predicted (CP'ed) if all possible classifiers trained on top of all possible worlds induced by the incompleteness of data would yield the same prediction. We study two fundamental CP queries: (Q1) checking query that determines whether a data example can be CP'ed; and (Q2) counting query that computes the number of classifiers that support a particular prediction (i.e., label). Given that general solutions to CP queries are, not surprisingly, hard without assumption over the type of classifier, we further present a case study in the context of nearest neighbor (NN) classifiers, where efficient solutions to CP queries can be developed -- we show that it is possible to answer both queries in linear or polynomial time over exponentially many possible worlds. 機械学習のためのデータクリーニング (DC for ML) の重要な応用例として, CPの例を挙げる。 CPをベースとしたCPCleanアプローチは,手作業による手作業による分類精度において,既存の手法よりも大幅に優れることが示された。

関連論文リスト

Probably Approximately Precision and Recall Learning [60.00180898830079]
機械学習における重要な課題は、一方的なフィードバックの頻度である。本稿では,確率的近似(PAC)フレームワークを導入し,各入力をラベルの集合にマッピングする仮説を定めている。我々は、正のデータのみから学習する新しいアルゴリズムを開発し、実現可能な場合において最適なサンプル複雑性を実現する。
論文参考訳（メタデータ） (2024-11-20T04:21:07Z)
Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning [8.209660505275872]
本稿では、データセット固有の階層的ラベル構造を利用する階層的拡散(HDC)を提案する。 HDCは、維持しながら最大60%の推論を加速し、場合によっては分類精度を向上させる。我々の研究により、速度と精度のトレードオフの新しい制御機構が実現され、現実世界のアプリケーションでは拡散に基づく分類がより有効になる。
論文参考訳（メタデータ） (2024-11-18T21:34:05Z)
Probabilistically robust conformal prediction [9.401004747930974]
コンフォーマル予測(CP)は、ディープニューラルネットワークを含む機械学習分類器の不確実性を定量化するフレームワークである。 CPに関する既存の作業のほとんど全てがクリーンなテストデータを前提としており、CPアルゴリズムの堅牢性についてはあまり知られていない。本稿では,ほとんどの摂動に対して頑健性を保証する確率論的頑健な共形予測(PRCP)の問題について検討する。
論文参考訳（メタデータ） (2023-07-31T01:32:06Z)
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench [52.11481619456093]
実験記録におけるBIGベンチの性能予測問題について検討する。 95%以上のR2$スコアは、実験記録の中に学習可能なパターンが存在することを示している。 BIG-bench Hardのように新しいモデルファミリーを評価できるサブセットが3倍程度小さくなっています。
論文参考訳（メタデータ） (2023-05-24T09:35:34Z)
Interpretable by Design: Learning Predictors by Composing Interpretable Queries [8.054701719767293]
機械学習アルゴリズムは設計によって解釈されるべきである。正確な予測に必要なクエリの数を最小限に抑える。視覚とNLPタスクの実験は、我々のアプローチの有効性を実証している。
論文参考訳（メタデータ） (2022-07-03T02:40:34Z)
Certifiable Robustness for Nearest Neighbor Classifiers [6.487663563916903]
単純で広くデプロイされた分類アルゴリズム、$k$-Nearest Neighbors(k$-NN)の認証の複雑さについて検討する。制約が関数依存(FD)である場合には、一貫性のないデータセットに重点を置いています。そこでは、あるラベルを予測する可能性のある世界の数を数えることが目的である。
論文参考訳（メタデータ） (2022-01-13T02:55:10Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
我々はPFN(Presideed Data Fitted Networks)を提案する。 PFNは、大規模機械学習技術におけるインコンテキスト学習を活用して、大規模な後部集合を近似する。我々は、PFNがガウス過程をほぼ完璧に模倣し、難解問題に対する効率的なベイズ推定を可能にすることを示した。
論文参考訳（メタデータ） (2021-12-20T13:07:39Z)
CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
本稿では,分類ラベルをセグメントマップの予測から導出する小型データセットのコスト効率の高い分類器であるCvSを提案する。我々は,CvSが従来の手法よりもはるかに高い分類結果が得られることを示す多種多様な問題に対して,本フレームワークの有効性を評価する。
論文参考訳（メタデータ） (2021-10-29T18:41:15Z)
Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
textbfDense textbfClassification と textbfAttentive textbfPooling を利用して埋め込みの質を向上させる方法を示す。広範に使われているグローバル平均プール (GAP) の代わりに, 注意深いプールを施し, 特徴マップをプールすることを提案する。
論文参考訳（メタデータ） (2021-03-30T00:48:28Z)
Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
我々は,任意のラウンドにおいて高い確率で最良のモデルをラベル付けし,出力する情報的サンプルを積極的に選択するオンライン選択的サンプリング手法を設計する。我々のアルゴリズムは、敵とストリームの両方のオンライン予測タスクに利用できる。
論文参考訳（メタデータ） (2020-10-19T19:53:15Z)
Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
ロングテール認識は、現実世界のシナリオにおける自然な非一様分散データに取り組む。モダンは人口密度の高いクラスではうまく機能するが、そのパフォーマンスはテールクラスでは著しく低下する。 Deep-RTCは、リアリズムと階層的予測を組み合わせたロングテール問題の新しい解法として提案されている。
論文参考訳（メタデータ） (2020-07-20T05:57:42Z)
Estimating g-Leakage via Machine Learning [34.102705643128004]
本稿では,ブラックボックスシナリオにおけるシステムの情報漏洩を推定する問題について考察する。システムの内部は学習者にとって未知であり、分析するには複雑すぎると仮定される。機械学習(ML)アルゴリズムを用いて,g-vulnerabilityをブラックボックスで推定する手法を提案する。
論文参考訳（メタデータ） (2020-05-09T09:26:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。