Fugu-MT 論文翻訳(概要): Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

論文の概要: Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

arxiv url: http://arxiv.org/abs/2102.07929v3
Date: Thu, 30 May 2024 07:17:20 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-01 00:29:19.904819
Title: Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
Title（参考訳）: 確率環境下での微分プライベートオンライン学習のための準最適アルゴリズム
Authors: Bingshan Hu, Zhiming Huang, Nishant A. Mehta, Nidhi Hegde,
Abstract要約: 本研究では,バンディットとフルインフォメーションの両方のフィードバックの下で,オンライン学習環境における個人差分問題について検討する。差分的な私的盗賊に対しては、UTBとトンプソンサンプリングに基づくアルゴリズムを同時に提案し、最適な$O左(sum_j: Delta_j>0 fracln(T)min leftDelta_j, epsilon right)$ minimax lower boundとする。同じ差分プライベートなフル情報設定に対しては、$epsilon$-differentially も提示する。
参考スコア（独自算出の注目度）: 7.4288915613206505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.
Abstract（参考訳）: 本稿では,バンディットとフルインフォメーションの両方のフィードバックの下で,確率的環境下での個人的オンライン学習問題について検討する。差分的にプライベートな確率的包帯に対して、UTBとトンプソンサンプリングに基づくアルゴリズムは、いつでも最適な$O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$は最適なアームとサブ最適アームの差分である$j$, $\epsilon$は必要なプライバシーパラメータである。確率的報酬を持つ微分プライベートな完全な情報設定については、$\Omega \left(\frac{\ln(K)}{\min\{\Delta_{\min}, \epsilon \right \right)$ instance-dependent regret lower bound と $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound を示す。同じ差分プライベートなフル情報設定に対して、インスタンス依存の後悔と最悪の後悔が各下位境界にマッチする$\epsilon$-differentially privateアルゴリズムを、追加の$\log(T)$ factorまで提示する。

関連論文リスト

New Rates in Stochastic Decision-Theoretic Online Learning under Differential Privacy [17.711455925206298]
HuとMehta(2024年)は、オープンな問題を提起した:$varepsilon$-differential privacyの下で、決定論的オンライン学習($K$アクションと$T$ラウンドを含む)の最適なインスタンス依存率は何ですか? 本稿では,2つの新しい結果を得ることで,この問題に部分的に対処する。まず,$Oleft(fraclog KDelta_min + fraclog2Kvarepsilonright)$。第二に
論文参考訳（メタデータ） (2025-02-16T05:13:51Z)
Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation [11.345141417475956]
近似最小スパンニングツリー(MST)のプライベートリリース問題について検討する。既存のプライベートMSTアルゴリズムはトレードオフに直面し、計算効率または精度を犠牲にする。重みベクトルをプライベートにするのに十分でない入力のランダムな摂動により、結果は非公開となり、最先端のエラー保証が達成される。
論文参考訳（メタデータ） (2024-12-13T13:22:39Z)
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy [32.58099924135157]
ユーザ好みでアクティブな学習を行うための,最初の微分プライベートなダリング帯域幅アルゴリズムを提案する。我々のアルゴリズムは、ほぼ最適性能で計算効率が良い。我々は結果を任意の一般的な決定空間に拡張し、潜在的に無限の腕を持つ$d$次元を持つ。
論文参考訳（メタデータ） (2024-03-22T09:02:12Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
線形混合遷移カーネルを用いた最短経路(SSP)問題について検討する。エージェントは繰り返し環境と対話し、累積コストを最小化しながら特定の目標状態に到達する。既存の作業は、イテレーションコスト関数の厳密な下限や、最適ポリシーに対する期待長の上限を仮定することが多い。
論文参考訳（メタデータ） (2024-02-14T07:52:00Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
本研究は,リスク許容度が$tau$のCVaR(Conditional Value at Risk)の目的に着目し,リスクに敏感な強化学習(RL)について検討する。ミニマックスCVaRの後悔率は$Omega(sqrttau-1AK)$で、$A$はアクションの数、$K$はエピソード数である。我々は,このアルゴリズムが連続性仮定の下で$widetilde O(tau-1sqrtSAK)$の最適後悔を達成し,一般に近似することを示す。
論文参考訳（メタデータ） (2023-02-07T02:22:31Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
シャッフルモデルにおけるマルチアームバンディット(MAB)問題に対して,$(varepsilon,delta)$-differentially privateアルゴリズムを提案する。我々の上限は、集中モデルにおいて最もよく知られたアルゴリズムの後悔とほぼ一致し、局所モデルにおいて最もよく知られたアルゴリズムを著しく上回っている。
論文参考訳（メタデータ） (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
対数要因まで $(varepsilon,delta)$-differently private の最適過剰人口損失は $sqrtlog(d)/n + sqrtd/varepsilon n.$ です。損失関数がさらなる滑らかさの仮定を満たすとき、余剰損失は$sqrtlog(d)/n + (log(d)/varepsilon n)2/3で上界(対数因子まで)であることが示される。
論文参考訳（メタデータ） (2021-03-02T06:53:44Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
(Locally) Differentially Private Combinatorial Semi-Bandits [26.462686161971803]
我々は,従来のマルチアーマッド・バンド(MAB)の拡張であるコンビニアル・セミバンド(CSB)と,より強力なローカル・ディファレンシャル・プライバシ(LDP)設定について検討した。
論文参考訳（メタデータ） (2020-06-01T04:23:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。