Fugu-MT 論文翻訳(概要): On Randomized Algorithms in Online Strategic Classification

論文の概要: On Randomized Algorithms in Online Strategic Classification

arxiv url: http://arxiv.org/abs/2602.06257v1
Date: Thu, 05 Feb 2026 23:17:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-09 22:18:26.153833
Title: On Randomized Algorithms in Online Strategic Classification
Title（参考訳）: オンライン戦略分類におけるランダム化アルゴリズムについて
Authors: Chase Hutton, Adam Melrod, Han Shao,
Abstract要約: オンライン戦略分類のための洗練された境界を2つの設定で提供する。実現可能な設定では、ランダム化アルゴリズムでは下界は知られていない。また、既知の(決定論的)上限である$O(sqrtTlog|mathcal H|)$を改善する最初のランダム化学習者も提供する。
参考スコア（独自算出の注目度）: 8.952242119335638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.
Abstract（参考訳）: オンライン戦略分類は、エージェントが有利な予測を得るためにその特徴を戦略的に修正する設定である。例えば、信用スコアに基づいてローン承認を決定する分類器が与えられた場合、申請者はクレジットカードや銀行口座を開設または閉鎖して、肯定的な予測を得ることができる。学習目標は、このような戦略的行動にもかかわらず、低いミスや後悔の限界を達成することです。ランダム化されたアルゴリズムは、戦略的な設定で学習者に利点を提供する可能性があるが、それらは主に過小評価されている。実現可能な設定では、ランダム化アルゴリズムでは下位境界は知られておらず、決定論的学習者のための既存の下位境界構造はランダム化によって回避できる。不可知的な設定では、最もよく知られた後悔の上界は$O(T^{3/4}\log^{1/4}T|\mathcal H|)$であり、これは標準的なオンライン学習率$O(\sqrt{T\log|\mathcal H|})$とはかけ離れている。本研究は,両設定のオンライン戦略分類のための洗練された境界を提供する。実現可能な設定では、$T > \mathrm{Ldim}(\mathcal{H}) Δ^2$ に対して、既存の下界 $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ をすべての学習者に対して決定論的学習者に対して拡張する。これにより、ランダム化された学習者に適用される最初の下界が得られる。また、既知の(決定論的)上限である$O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$を改善する最初のランダム化学習者も提供する。不可解な設定では、凸最適化手法を用いて、後悔の上界を$O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$に改良する適切な学習者を与える。適切な学習規則のすべてに対して、対数的要素の一致を低く示し、適切な学習者間の学習者の最適性を実証する。そのため、後悔の保証をさらに改善するためには不適切な学習が必要である。

関連論文リスト

Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings [13.129167404736137]
我々は、学習者がT$のデータポイントのシーケンスを受け取り、仮説の段階ごとに応答しなければならない、プライベートオンライン学習の問題を再考する。出力仮説全体のストリームは、差分プライバシーを満たすべきである。一般的なLittlestoneクラスに対して$tildeO_d(sqrtT)$のサブ線形後悔を実現するアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-10-01T06:53:57Z)
Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
我々は,個別かつ不可逆な意思決定を対象とするオンライン学習と最適化の問題を定義した。各期間において、意思決定者は、オープンする施設を選択し、それぞれの成功に関する情報を受け取り、将来の決定を導くために分類モデルを更新する。目的は,多数の施設を対象とする地平線を特徴とし,カバー対象を反映するチャンス制約の下で施設開口を最小化することである。
論文参考訳（メタデータ） (2024-06-20T23:00:25Z)
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
本稿では,政策評価,最良政策識別,後悔の最小化のための効率的なアルゴリズムを提案する。政策評価と最良の政策識別のために,我々のアルゴリズムは最小限に最適であることを示す。提案アルゴリズムは、まずスペクトル法を利用して、低ランク報酬行列の左特異部分空間と右特異部分空間を推定する。
論文参考訳（メタデータ） (2024-02-24T06:36:08Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
我々のアルゴリズムが $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping is linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ is the standardities of the state and action space。
論文参考訳（メタデータ） (2023-05-15T05:37:32Z)
Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
専門家によるオンライン予測は機械学習の基本的な問題であり、いくつかの研究がプライバシーの制約の下でこの問題を研究している。本研究では,非適応的敵に対する最良な既存アルゴリズムの残差を克服する新たなアルゴリズムを提案し,解析する。
論文参考訳（メタデータ） (2022-10-24T18:40:19Z)
Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games [36.969021834291745]
本稿では,仮説クラスの逐次的脂肪散乱次元の観点から,ほぼ最適誤差を導出する固有学習アルゴリズムを提案する。この結果は、適切な学習者が準最適誤り境界を達成できるかどうかという疑問に答える。実数値(回帰)設定では、最適誤り境界は不適切な学習者にさえ知られていなかった。
論文参考訳（メタデータ） (2021-11-17T05:24:21Z)
Littlestone Classes are Privately Online Learnable [28.04975353867202]
プライバシー制約下でのオンライン分類の問題点を考察する。この設定では、学習者はラベル付き例のストリームを$(x_t, y_t)$, for $1 leq t leq T$で順次観察し、各イテレーションで新しい例のラベルを予測するために使用される仮説$h_t$を返す。学習者のパフォーマンスは、既知の仮説クラスである$mathcalH$に対する後悔によって測定される。
論文参考訳（メタデータ） (2021-06-25T09:08:33Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
本稿では,ランダム化最小二乗値反復(RLSVI)アルゴリズムに着想を得たモデルレス強化学習アルゴリズムを提案する。提案アルゴリズムは,スカラーノイズを用いたトレーニングデータを簡易に摂動させることにより,探索を促進する。我々はこの理論を、既知の困難な探査課題にまたがる実証的な評価で補完する。
論文参考訳（メタデータ） (2021-06-15T02:23:07Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。