Fugu-MT 論文翻訳(概要): Near-Optimal Algorithms for Private Online Learning in a Stochastic Environment

論文の概要: Near-Optimal Algorithms for Private Online Learning in a Stochastic Environment

arxiv url: http://arxiv.org/abs/2102.07929v2
Date: Thu, 8 Jun 2023 21:34:21 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-12 18:50:58.379776
Title: Near-Optimal Algorithms for Private Online Learning in a Stochastic Environment
Title（参考訳）: 確率環境におけるプライベートオンライン学習のための近似最適アルゴリズム
Authors: Bingshan Hu and Zhiming Huang and Nishant A. Mehta
Abstract要約: 最適な後悔境界を達成できる任意の UCB ベースのアルゴリズムを提案する。また、$Omega left (maxleft fraclog KDelta_min, fraclog Kepsilon right right)$ problem-dependent lower bound を示す。
参考スコア（独自算出の注目度）: 10.731932533903509
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider two variants of private stochastic online learning. The first variant is differentially private stochastic bandits. Previously, Sajed and Sheffet (2019) devised the DP Successive Elimination (DP-SE) algorithm that achieves the optimal $ O \biggl(\sum\limits_{1\le j \le K: \Delta_j >0} \frac{ \log T}{ \Delta_j} + \frac{ K\log T}{\epsilon} \biggr)$ problem-dependent regret bound, where $K$ is the number of arms, $\Delta_j$ is the mean reward gap of arm $j$, $T$ is the time horizon, and $\epsilon$ is the required privacy parameter. However, like other elimination style algorithms, it is not an anytime algorithm. Until now, it was not known whether UCB-based algorithms could achieve this optimal regret bound. We present an anytime, UCB-based algorithm that achieves optimality. Our experiments show that the UCB-based algorithm is competitive with DP-SE. The second variant is the full information version of private stochastic online learning. Specifically, for the problem of decision-theoretic online learning with stochastic rewards, we present the first algorithm that achieves an $ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, where $\Delta_{\min}$ is the minimum mean reward gap. In addition, we also show an $\Omega \left( \max\left\{ \frac{\log K}{\Delta_{\min}}, \frac{\log K}{\epsilon} \right\} \right)$ problem-dependent lower bound. The key idea behind our good theoretical guarantees in both settings is forgetfulness, i.e., decisions are made based on a certain amount of newly obtained observations instead of all the observations obtained from the very beginning.
Abstract（参考訳）: 個人の確率的オンライン学習の2つのバリエーションを検討する。最初の変種は、差動的にプライベートな確率的バンドである。以前は sajed と sheffet (2019) が、最適な $ o \biggl (\sum\limits_{1\le j \le k: \delta_j >0} \frac{ \log t}{ \delta_j} + \frac{ k\log t}{\epsilon} \biggr)$ problem-dependent regret bound を考案した。しかし、他の除去スタイルアルゴリズムと同様に、これは時限アルゴリズムではない。これまで、UCBベースのアルゴリズムがこの最適な後悔の限界を達成できるかどうかは分かっていなかった。最適性を実現するための任意の UCB ベースのアルゴリズムを提案する。実験の結果,本アルゴリズムはdp-seと競合することがわかった。第2の変種は、プライベート確率オンライン学習の完全な情報バージョンである。具体的には、確率的報酬を伴う決定論的オンライン学習の問題に対して、$ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, ここで$\Delta_{\min}$は最小平均報酬ギャップである。さらに、$\Omega \left( \max\left\{ \frac{\log K}{\Delta_{\min}}, \frac{\log K}{\epsilon} \right\} \right)$ problem-dependent lower boundを示す。両設定の正しい理論的保証の背後にある重要な考え方は、忘れがちである、すなわち、決定は、最初の段階で得られたすべての観測ではなく、ある量の新しい観測に基づいてなされる。

関連論文リスト

New Rates in Stochastic Decision-Theoretic Online Learning under Differential Privacy [17.711455925206298]
HuとMehta(2024年)は、オープンな問題を提起した:$varepsilon$-differential privacyの下で、決定論的オンライン学習($K$アクションと$T$ラウンドを含む)の最適なインスタンス依存率は何ですか? 本稿では,2つの新しい結果を得ることで,この問題に部分的に対処する。まず,$Oleft(fraclog KDelta_min + fraclog2Kvarepsilonright)$。第二に
論文参考訳（メタデータ） (2025-02-16T05:13:51Z)
Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation [11.345141417475956]
近似最小スパンニングツリー(MST)のプライベートリリース問題について検討する。既存のプライベートMSTアルゴリズムはトレードオフに直面し、計算効率または精度を犠牲にする。重みベクトルをプライベートにするのに十分でない入力のランダムな摂動により、結果は非公開となり、最先端のエラー保証が達成される。
論文参考訳（メタデータ） (2024-12-13T13:22:39Z)
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy [32.58099924135157]
ユーザ好みでアクティブな学習を行うための,最初の微分プライベートなダリング帯域幅アルゴリズムを提案する。我々のアルゴリズムは、ほぼ最適性能で計算効率が良い。我々は結果を任意の一般的な決定空間に拡張し、潜在的に無限の腕を持つ$d$次元を持つ。
論文参考訳（メタデータ） (2024-03-22T09:02:12Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
線形混合遷移カーネルを用いた最短経路(SSP)問題について検討する。エージェントは繰り返し環境と対話し、累積コストを最小化しながら特定の目標状態に到達する。既存の作業は、イテレーションコスト関数の厳密な下限や、最適ポリシーに対する期待長の上限を仮定することが多い。
論文参考訳（メタデータ） (2024-02-14T07:52:00Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
本研究は,リスク許容度が$tau$のCVaR(Conditional Value at Risk)の目的に着目し,リスクに敏感な強化学習(RL)について検討する。ミニマックスCVaRの後悔率は$Omega(sqrttau-1AK)$で、$A$はアクションの数、$K$はエピソード数である。我々は,このアルゴリズムが連続性仮定の下で$widetilde O(tau-1sqrtSAK)$の最適後悔を達成し,一般に近似することを示す。
論文参考訳（メタデータ） (2023-02-07T02:22:31Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
シャッフルモデルにおけるマルチアームバンディット(MAB)問題に対して,$(varepsilon,delta)$-differentially privateアルゴリズムを提案する。我々の上限は、集中モデルにおいて最もよく知られたアルゴリズムの後悔とほぼ一致し、局所モデルにおいて最もよく知られたアルゴリズムを著しく上回っている。
論文参考訳（メタデータ） (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
対数要因まで $(varepsilon,delta)$-differently private の最適過剰人口損失は $sqrtlog(d)/n + sqrtd/varepsilon n.$ です。損失関数がさらなる滑らかさの仮定を満たすとき、余剰損失は$sqrtlog(d)/n + (log(d)/varepsilon n)2/3で上界(対数因子まで)であることが示される。
論文参考訳（メタデータ） (2021-03-02T06:53:44Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
(Locally) Differentially Private Combinatorial Semi-Bandits [26.462686161971803]
我々は,従来のマルチアーマッド・バンド(MAB)の拡張であるコンビニアル・セミバンド(CSB)と,より強力なローカル・ディファレンシャル・プライバシ(LDP)設定について検討した。
論文参考訳（メタデータ） (2020-06-01T04:23:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。