Fugu-MT 論文翻訳(概要): Improved Regret Bounds for Online Submodular Maximization

論文の概要: Improved Regret Bounds for Online Submodular Maximization

arxiv url: http://arxiv.org/abs/2106.07836v1
Date: Tue, 15 Jun 2021 02:05:35 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-16 15:21:14.434287
Title: Improved Regret Bounds for Online Submodular Maximization
Title（参考訳）: オンラインサブモジュラー最大化のための後悔の限界の改善
Authors: Omid Sadeghi, Prasanna Raut and Maryam Fazel
Abstract要約: 我々は、各ステップ$tin[T]$において、固定凸からアクション$x_t$を選択し、コンパクトなドメインセット$mathcalK$を選択するオンライン最適化問題を考える。ユーティリティ関数 $f_t(cdot)$ が明らかになり、アルゴリズムはペイオフ $f_t(x_t)$ を受け取る。
参考スコア（独自算出の注目度）: 10.089520556398575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider an online optimization problem over $T$ rounds where at each step $t\in[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $\mathcal{K}$. A utility function $f_t(\cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$. This problem has been previously studied under the assumption that the utilities are adversarially chosen monotone DR-submodular functions and $\mathcal{O}(\sqrt{T})$ regret bounds have been derived. We first characterize the class of strongly DR-submodular functions and then, we derive regret bounds for the following new online settings: $(1)$ $\{f_t\}_{t=1}^T$ are monotone strongly DR-submodular and chosen adversarially, $(2)$ $\{f_t\}_{t=1}^T$ are monotone submodular (while the average $\frac{1}{T}\sum_{t=1}^T f_t$ is strongly DR-submodular) and chosen by an adversary but they arrive in a uniformly random order, $(3)$ $\{f_t\}_{t=1}^T$ are drawn i.i.d. from some unknown distribution $f_t\sim \mathcal{D}$ where the expected function $f(\cdot)=\mathbb{E}_{f_t\sim\mathcal{D}}[f_t(\cdot)]$ is monotone DR-submodular. For $(1)$, we obtain the first logarithmic regret bounds. In terms of the second framework, we show that it is possible to obtain similar logarithmic bounds with high probability. Finally, for the i.i.d. model, we provide algorithms with $\tilde{\mathcal{O}}(\sqrt{T})$ stochastic regret bound, both in expectation and with high probability. Experimental results demonstrate that our algorithms outperform the previous techniques in the aforementioned three settings.
Abstract（参考訳）: 本稿では、各ステップ $t\in[t]$ において、固定凸およびコンパクトな領域セット $\mathcal{k}$ から、アルゴリズムがアクション $x_t$ を選択することで、$t$ ラウンドを超えるオンライン最適化問題を考える。ユーティリティ関数 $f_t(\cdot)$ が明らかになり、アルゴリズムはペイオフ $f_t(x_t)$ を受け取る。この問題は以前、ユーティリティが反対に選択された単調のdr-サブモジュラー関数であり、$\mathcal{o}(\sqrt{t})$ regret boundsが導かれるという仮定の下で研究されてきた。まず、強いDR-部分モジュラ函数のクラスを特徴付け、次に、次の新しいオンライン設定に対する後悔境界を導出する:$(1)$\{f_t\}_{t=1}^T$ is monotone strong DR-submodular and selected adversarially, $(2)$$$\{f_t\}_{t=1}^T$ are monotone submodular (一方平均$$\frac{1}{T}\sum_{t=1}^T f_t$ is strong DR-submodular) は、逆数によって選択されるが、$(3)$\{f_t\}_{t=1}^T$は、一様ランダム順序で現れる。未知の分布 $f_t\sim \mathcal{d}$ ここで期待される関数 $f(\cdot)=\mathbb{e}_{f_t\sim\mathcal{d}}[f_t(\cdot)]$ は単調 dr-submodular である。 $(1) の場合、最初の対数的後悔境界を得る。第2の枠組みに関して、高い確率で同様の対数境界を得ることができることを示す。最後に i. i. d. モデルでは、期待と高い確率の両方において、確率的後悔の束縛を$\tilde{\mathcal{o}}(\sqrt{t})$というアルゴリズムを提供する。実験の結果, 上記の3つの設定において, 従来の手法よりも優れたアルゴリズムが得られた。

関連論文リスト

Minimax Optimal Submodular Optimization with Bandit Feedback [13.805872311596739]
単調な部分モジュラー集合関数 $f: 2[n] rightarrow [0,1]$ をバンドイットフィードバックの下で最大化する。具体的には、$f$は学習者には知られていないが、各時点で$t=1,dots,T$は、$|S_t| leq k$でセットの$S_tサブセット[n]$を選択し、$eta_t$が平均ゼロのサブガウスノイズである場合に、$f(S_t) + eta_t$を受け取る。
論文参考訳（メタデータ） (2023-10-27T20:19:03Z)
Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits [21.54858035450694]
分割マトロイド制約を持つ部分モジュラー帯域に対するサブ線形後悔アルゴリズムを提案する。バンドイットの逐次部分モジュラーに対して、既存の研究はO(T2/3)$ regret を証明し、近似比は1/2$である。
論文参考訳（メタデータ） (2023-05-21T08:51:55Z)
On Submodular Contextual Bandits [92.45432756301231]
作用が基底集合の部分集合であり、平均報酬が未知の単調部分モジュラ函数によってモデル化されるような文脈的包帯の問題を考える。 Inverse Gap Weighting 戦略により,提案アルゴリズムは推定関数の局所的最適度を効率よくランダム化することを示す。
論文参考訳（メタデータ） (2021-12-03T21:42:33Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Submodular + Concave [53.208470310734825]
第一次最適化法が凹関数の最大目的値に収束できることはよく確立されている。本研究では、滑らかな函数凸体(英語版)の行列式を$F(x) = G(x) +C(x)$で始める。このクラスの函数は、保証がないような凹凸函数と連続DR-部分モジュラ函数の両方の拡張である。
論文参考訳（メタデータ） (2021-06-09T01:59:55Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
オンライン凸最適化アルゴリズムは、あるドメインで$w_t$を出力する。この結果を用いて新しい「完全行列」型後悔境界を得る。
論文参考訳（メタデータ） (2020-02-10T17:22:08Z)
Regularized Submodular Maximization at Scale [45.914693923126826]
亜モジュラリティは本質的に多様性、カバレッジ、代表性の概念に関係している。正規化部分モジュラ函数 $f = g ell$ を行列式部分モジュラ関数 $g$ とモジュラ関数 $ell$ の差分として最大化する手法を提案する。
論文参考訳（メタデータ） (2020-02-10T02:37:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。