Fugu-MT 論文翻訳(概要): PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

論文の概要: PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

arxiv url: http://arxiv.org/abs/2605.25678v2
Date: Tue, 26 May 2026 08:08:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.164812
Title: PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting
Title（参考訳）: 帯域フィードバックによるPAC学習: 実現可能な設定におけるシャープサンプル複雑さ
Authors: Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri,
Abstract要約: 本研究では,マルチクラスPAC学習における帯域幅フィードバックによる課題について検討する。このフレームワークでは、インスタンススペース$mathcalX$とラベルスペース$mathcalY$に未知のデータ分散があります。我々は、この問題の最適サンプルの一般的な特徴を与え、すべての概念クラスを複雑さまで鋭くする。
参考スコア（独自算出の注目度）: 46.089126569274676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical multiclass PAC learning, but the learner does not observe the labels of the i.i.d. training examples. Instead, in each round, it receives an unlabeled instance, predicts its label, and receives bandit feedback indicating only whether the prediction is correct. Despite this restriction, the goal remains the same as in classical PAC learning. We provide a general characterization of the optimal sample complexity of this problem, sharp for every concept class up to logarithmic factors. Our characterization is based on a new combinatorial dimension, termed the bandit $\mathrm{DS}$ dimension, defined via generalized combinatorial structures we call pseudo-boxes. These extend the pseudo-cubes underlying the $\mathrm{DS}$ dimension by allowing a different number of neighbors in each coordinate. In contrast to the $\mathrm{DS}$ dimension, which governs the full-information setting by counting the number of coordinates in the pseudo-cube, the bandit $\mathrm{DS}$ dimension aggregates the number of neighbors across coordinates, leading to a characterization in which the sample complexity scales with the total number of neighbors. We also propose a general learning algorithm achieving the upper bound, based on an algorithmic principle called ListCascade, which connects bandit learning to list learning and may be of independent interest.
Abstract（参考訳）: 本研究では,マルチクラスPAC学習における帯域幅フィードバックによる課題について検討する。このフレームワークでは、古典的なマルチクラスPAC学習のように、インスタンス空間$\mathcal{X}$とラベル空間$\mathcal{Y}$に未知のデータ分布があるが、学習者はi.d.トレーニング例のラベルを観察しない。その代わりに、各ラウンドでラベルなしのインスタンスを受け取り、ラベルを予測し、予測が正しいかどうかのみを示す帯域幅フィードバックを受け取る。この制限にもかかわらず、ゴールは古典的なPAC学習と同じである。この問題の最適サンプル複雑性の一般的な特徴として、対数的因子まですべての概念クラスに鋭いものを挙げる。我々の特徴付けは、擬箱と呼ばれる一般化された組合せ構造を通して定義されるbandit $\mathrm{DS}$ dimensionと呼ばれる新しい組合せ次元に基づいている。これらは、各座標において異なる数の近傍を許容することにより、$\mathrm{DS}$次元の擬キューブを拡張する。擬キューブ内の座標数を数えて全情報設定を管理する$\mathrm{DS}$ dimensionとは対照的に、bandit $\mathrm{DS}$ dimension は座標をまたいだ隣人の数を集約し、サンプルの複雑さが隣人の総数と共にスケールする特徴を与える。また,帯域学習とリスト学習を結びつけるアルゴリズムであるListCascadeをベースとして,上位境界を達成するための一般学習アルゴリズムを提案する。

論文の概要: PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

関連論文リスト