Fugu-MT 論文翻訳(概要): Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints

論文の概要: Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints

arxiv url: http://arxiv.org/abs/2505.06709v1
Date: Sat, 10 May 2025 17:23:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-13 20:21:49.004271
Title: Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints
Title（参考訳）: 逆制約付きオンライン凸最適化のための制約違反($\tilde{O}(\sqrt{T})$
Authors: Abhishek Sinha, Rahul Vaze,
Abstract要約: 逆制約を伴うオンライン凸最適化問題を再検討する。我々は,$tildeO(sqrtdT+Tbeta)$ regretと$tildeO(dT1-beta)$ CCVを実現するオンラインポリシーを提案する。
参考スコア（独自算出の注目度）: 16.99491218081617
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We revisit the Online Convex Optimization problem with adversarial constraints (COCO) where, in each round, a learner is presented with a convex cost function and a convex constraint function, both of which may be chosen adversarially. The learner selects actions from a convex decision set in an online fashion, with the goal of minimizing both regret and the cumulative constraint violation (CCV) over a horizon of $T$ rounds. The best-known policy for this problem achieves $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. In this paper, we present a surprising improvement that achieves a significantly smaller CCV by trading it off with regret. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves $\tilde{O}(\sqrt{dT}+ T^\beta)$ regret and $\tilde{O}(dT^{1-\beta})$ CCV, where $d$ is the dimension of the decision set and $\beta \in [0,1]$ is a tunable parameter. We achieve this result by first considering the special case of $\textsf{Constrained Expert}$ problem where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose an efficient policy for the $\textsf{Constrained Expert}$ problem, that attains $O(\sqrt{T\ln N}+T^{\beta})$ regret and $\tilde{O}(T^{1-\beta} \ln N)$ CCV, where $N$ is the number of experts. The original problem is then reduced to the $\textsf{Constrained Expert}$ problem via a covering argument. Finally, with an additional smoothness assumption, we propose an efficient gradient-based policy attaining $O(T^{\max(\frac{1}{2},\beta)})$ regret and $\tilde{O}(T^{1-\beta})$ CCV.
Abstract（参考訳）: 逆制約付きオンライン凸最適化問題(COCO)を再検討し、各ラウンドで学習者に凸コスト関数と凸制約関数を提示する。学習者は、T$ラウンドの地平線上で、後悔と累積制約違反(CCV)を最小化することを目的として、オンライン形式で設定された凸決定から行動を選択する。この問題の最もよく知られているポリシーは、$O(\sqrt{T})$ regret と $\tilde{O}(\sqrt{T})$ CCV である。本報告では, 後悔と引き換えにCCVを著しく小さくする, 驚くべき改善について述べる。具体的には、任意の有界凸コストと制約関数に対して、$\tilde{O}(\sqrt{dT}+ T^\beta)$ regret と $\tilde{O}(dT^{1-\beta})$ CCV を達成するオンラインポリシーを提案し、$d$ は決定集合の次元であり、$\beta \in [0,1]$ はチューナブルパラメータである。まず、決定集合が確率的単純集合であり、コストと制約関数が線型であるような特別な場合を$\textsf{Constrained Expert}$問題として考える。新しい適応的小さめの遺言境界を活用すれば、$\textsf{Constrained Expert}$問題に対して、$O(\sqrt{T\ln N}+T^{\beta})$ regret と $\tilde{O}(T^{1-\beta} \ln N)$ CCVに達し、$N$は専門家の数である。元の問題は、カバー引数によって$\textsf{Constrained Expert}$問題に還元される。最後に、さらなる滑らかさを仮定して、$O(T^{\max(\frac{1}{2},\beta)})$ regret および $\tilde{O}(T^{1-\beta})$ CCV となるような効率的な勾配に基づくポリシーを提案する。

関連論文リスト

Optimal Bounds for Adversarial Constrained Online Convex Optimization [1.9336815376402723]
後悔とCCVの両方で最適な$O(sqrtT)$バウンドが得られることを示す。制約関数に最小限のペナルティを課す新しい代理損失関数に基づいて、Follow-the-Regularized-Leader と Online Gradient Descent の両方が最適境界を達成することを示す。
論文参考訳（メタデータ） (2025-03-17T16:51:16Z)
Constrained Online Convex Optimization with Polyak Feasibility Steps [3.928749790761187]
固定制約関数 $g : mathbbRd rightarrow mathbbR$ を用いてオンライン凸最適化について検討する。制約満足度$g(x_t) leq 0 forall in [T]$, and matching $O(sqrtT)$ regret guarantees。
論文参考訳（メタデータ） (2025-02-18T18:26:20Z)
$O(\sqrt{T})$ Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization [16.99491218081617]
目的は、静的な後悔と累積的制約違反(CCV)を同時に最小化することである。アルゴリズムは、$O(sqrtT)$と$mincV, O(sqrtTlog T)$のCCVの静的後悔を保証する。
論文参考訳（メタデータ） (2025-02-07T15:47:04Z)
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
逆制約を伴うオンライン凸最適化(OCO)について検討する。本稿では,損失関数と制約関数の予測にアルゴリズムがアクセス可能な設定に着目する。以上の結果から,現在のO(sqrtT) $ regret と $ tildeO(sqrtT) $ cumulative constraint violation の改善が期待できることがわかった。
論文参考訳（メタデータ） (2024-12-11T03:06:42Z)
Optimal Algorithms for Online Convex Optimization with Adversarial Constraints [16.99491218081617]
COCOでは、そのラウンドのアクションを選択した後、学習者に凸コスト関数と凸制約関数を明らかにする。我々は、オンラインポリシーが、制限的な仮定なしで、同時に$O(sqrtT)$ regretと$tildeO(sqrtT)$ CCVを達成できることを示します。
論文参考訳（メタデータ） (2023-10-29T09:55:41Z)
Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
オンライン凸最適化のための効率的なプロジェクションフリーアルゴリズムであるFrank-Wolfe (OFW) の動的後悔について検討する。本稿では,FWの高速収束率をオフライン最適化からオンライン最適化に拡張することにより,OFWの動的後悔境界の改善を導出する。
論文参考訳（メタデータ） (2023-02-11T07:19:51Z)
Towards Painless Policy Optimization for Constrained MDPs [46.12526917024248]
我々は、無限の地平線における政策最適化、$gamma$-discounted constrained Markov decision process (CMDP)について研究する。我々の目標は、小さな制約違反で大きな期待された報酬を達成する政策を返却することである。本稿では,任意のアルゴリズムに対して,報酬の準最適性と制約違反を拘束できる汎用的原始双対フレームワークを提案する。
論文参考訳（メタデータ） (2022-04-11T15:08:09Z)
Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
オンライン学習アルゴリズムを考案し、その期待された後悔を保証します。当時のこの後悔は、$A$と$B$が未知の場合、$widetildeO((d_u+d_x)sqrtd_xT)$によって上界(i)となる。
論文参考訳（メタデータ） (2021-09-29T14:07:21Z)
Online Convex Optimization with Continuous Switching Constraint [78.25064451417082]
連続的なスイッチング制約を伴うオンライン凸最適化の問題を紹介する。強い凸関数の場合、後悔境界は$O(log T)$ for $S=Omega(log T)$、$O(minT/exp(S)+S,T)$ for $S=O(log T)$に改善できることを示す。
論文参考訳（メタデータ） (2021-03-21T11:43:35Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。