Fugu-MT 論文翻訳(概要): Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

論文の概要: Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

arxiv url: http://arxiv.org/abs/2504.00461v1
Date: Tue, 01 Apr 2025 06:35:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-03 15:43:08.960608
Title: Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
Title（参考訳）: アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・グラフにおけるオンラインショート・パスの効率的な近似アルゴリズム
Authors: Arnab Maiti, Zhiyuan Fan, Kevin Jamieson, Lillian J. Ratliff, Gabriele Farina,
Abstract要約: 本稿では,適応的相手に対する帯域フィードバックの下で,有向非巡回グラフ(DAG)における最短経路問題について検討する。我々は,任意の適応的敵に対して高い確率で$tilde O(sqrt|E|Tlog |X|)$の最小限の最小残差を求めるアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 34.38978643261337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. Given a DAG $G = (V, E)$ with a source node $v_{\mathsf{s}}$ and a sink node $v_{\mathsf{t}}$, let $X \subseteq \{0,1\}^{|E|}$ denote the set of all paths from $v_{\mathsf{s}}$ to $v_{\mathsf{t}}$. At each round $t$, we select a path $\mathbf{x}_t \in X$ and receive bandit feedback on our loss $\langle \mathbf{x}_t, \mathbf{y}_t \rangle \in [-1,1]$, where $\mathbf{y}_t$ is an adversarially chosen loss vector. Our goal is to minimize regret with respect to the best path in hindsight over $T$ rounds. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of $\tilde O(\sqrt{|E|T\log |X|})$ with high probability against any adaptive adversary, where $\tilde O(\cdot)$ hides logarithmic factors in the number of edges $|E|$. Our algorithm leverages a novel loss estimator and a centroid-based decomposition in a nontrivial manner to attain this regret bound. As an application, we show that our algorithm for DAGs provides state-of-the-art efficient algorithms for $m$-sets, extensive-form games, the Colonel Blotto game, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, achieving improved high-probability regret guarantees in all these settings.
Abstract（参考訳）: 本稿では,適応的相手に対する帯域フィードバックの下で,有向非巡回グラフ(DAG)のオンライン最短経路問題について検討する。 DAG $G = (V, E)$ with a source node $v_{\mathsf{s}}$ and a sink node $v_{\mathsf{t}}$, let $X \subseteq \{0,1\}^{|E|}$は、$v_{\mathsf{s}}$から$v_{\mathsf{t}}$へのすべてのパスの集合を表す。各ラウンド $t$ において、パス $\mathbf{x}_t \in X$ を選択し、損失 $\langle \mathbf{x}_t, \mathbf{y}_t \rangle \in [-1,1]$ に対する帯域フィードバックを受け取る。私たちのゴールは、後見の最良のパスに関して、T$ラウンド以上の後悔を最小限に抑えることです。我々は、任意の適応的敵に対して高い確率で$\tilde O(\sqrt{|E|T\log |X|})$の最小限の後悔境界を達成できる最初の計算効率のアルゴリズムを提案し、$\tilde O(\cdot)$はエッジ数$|E|$の対数係数を隠蔽する。提案アルゴリズムは,新規な損失推定器とセントロイド分解を非自明な方法で利用して,この後悔の限界を達成する。アプリケーションとして、DAGのアルゴリズムは、$m$-sets、ワイドフォームゲーム、Coloner Blottoゲーム、有向グラフにおける最短ウォーク、ハイパーキューブ、マルチタスクのマルチアームバンディットに対して、最先端の効率的なアルゴリズムを提供し、これらすべての設定において高い確率的後悔保証を実現する。

論文の概要: Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

関連論文リスト