Fugu-MT 論文翻訳(概要): Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization

論文の概要: Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization

arxiv url: http://arxiv.org/abs/2206.07528v2
Date: Wed, 19 Feb 2025 19:47:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-21 22:18:11.124479
Title: Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization
Title（参考訳）: 破壊的コンテキスト探索と凸最適化のための密度ベースアルゴリズム
Authors: Renato Paes Leme, Chara Podimata, Jon Schneider,
Abstract要約: 対向雑音モデルにおいて、高次元における二項探索の一般化である文脈探索の問題を考察する。我々は$epsilon$-ballと絶対的な損失に焦点を当てている。
参考スコア（独自算出の注目度）: 21.287905447745953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of contextual search, a generalization of binary search in higher dimensions, in the adversarial noise model. Let $d$ be the dimension of the problem, $T$ be the time horizon and $C$ be the total amount of adversarial noise in the system. We focus on the $\epsilon$-ball and the absolute loss. For the $\epsilon$-ball loss, we give a tight regret bound of $O(C + d \log(1/\epsilon))$ improving over the $O(d^3 \log(1/\epsilon) \log^2(T) + C \log(T) \log(1/\epsilon))$ bound of Krishnamurthy et al (Operations Research '23). For the absolute loss, we give an efficient algorithm with regret $O(C+d \log T)$. To tackle the absolute loss case, we study the more general setting of Corruption-Robust Convex Optimization with Subgradient feedback, which is of independent interest. Our techniques are a significant departure from prior approaches. Specifically, we keep track of density functions over the candidate target vectors instead of a knowledge set consisting of the candidate target vectors consistent with the feedback obtained.
Abstract（参考訳）: 対向雑音モデルにおいて、高次元における二項探索の一般化である文脈探索の問題を考察する。 d$ を問題の次元とし、T$ を時間軸とし、C$ をシステム内の対向ノイズの総量とする。我々は$\epsilon$-ballと絶対的な損失に焦点を当てます。 O(C + d \log(1/\epsilon))$O(d^3 \log(1/\epsilon) \log^2(T) + C \log(T) \log(1/\epsilon))$ bound of Krishnamurthy et al (Operations Research '23)。絶対損失に対して、後悔する$O(C+d \log T)$の効率的なアルゴリズムを与える。絶対損失問題に対処するため, 個別の関心を持つ下位フィードバックを用いて, より一般的な破壊・腐食・凸最適化法について検討する。私たちの技術は、以前のアプローチから大きく離れています。具体的には、候補ベクトルからなる知識セットではなく、候補ベクトル上の密度関数の追跡を、得られたフィードバックと整合して行う。

関連論文リスト

Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
我々は、ErdHos-R'enyiランダムグラフのエッジ密度を強く推定する問題を、$G(n, dcirc/n)$で検討する。我々のアルゴリズムは2乗の総和階層に基づいている。
論文参考訳（メタデータ） (2025-03-05T21:45:17Z)
Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
オンライン有限水平マルコフ決定過程を逆向きに変化した損失と総括的帯域幅フィードバック(フルバンド幅)を用いて研究する。この種のフィードバックの下では、エージェントは、軌跡内の各中間段階における個々の損失よりも、軌跡全体に生じる総損失のみを観察する。この設定のための最初のポリシー最適化アルゴリズムを紹介します。
論文参考訳（メタデータ） (2025-02-06T12:03:24Z)
Batched Stochastic Bandit for Nondegenerate Functions [8.015503209312786]
本稿では,非退化関数に対するバッチ帯域学習問題について検討する。本稿では,非退化関数に対するバッチバンドイット問題をほぼ最適に解くアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-05-09T12:50:16Z)
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
両世界のベスト・オブ・ワールドズ・アルゴリズムを$K$武器付き線形文脈包帯に対して検討する。我々のアルゴリズムは、敵対的体制と敵対的体制の両方において、ほぼ最適の後悔の限界を提供する。
論文参考訳（メタデータ） (2023-12-24T08:27:30Z)
Corruption-Robust Offline Reinforcement Learning with General Function Approximation [60.91257031278004]
一般関数近似を用いたオフライン強化学習(RL)における劣化問題について検討する。我々のゴールは、崩壊しないマルコフ決定プロセス(MDP)の最適方針に関して、このような腐敗に対して堅牢で、最適でないギャップを最小限に抑える政策を見つけることである。
論文参考訳（メタデータ） (2023-10-23T04:07:26Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
我々は、未知の分布から生じる無限に多くのバンドイットアームを用いて純粋探索を研究する。私たちのゴールは、平均的な報酬が1-delta$の1つの高品質なアームを、最高の$eta$-fraction of armsの1つとして$varepsilon$内で効率的に選択することにあります。
論文参考訳（メタデータ） (2023-06-03T04:00:47Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
我々のアルゴリズムが $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping is linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ is the standardities of the state and action space。
論文参考訳（メタデータ） (2023-05-15T05:37:32Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
本研究では,有限水平マルコフ決定過程(MDP)によってモデル化されたエピソディック強化学習(RL)問題をバッチ数に制約を加えて検討する。我々は,$tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot)をほぼ最適に後悔するアルゴリズムを設計し,$(S,A,H,K)$の対数項を$K$で隠蔽する。技術的貢献は2つある: 1) 探索のためのほぼ最適設計スキーム
論文参考訳（メタデータ） (2022-10-15T09:22:22Z)
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions [98.75618795470524]
本稿では,敵対的腐敗の存在下での線形文脈的包帯問題について検討する。我々は不確実性に直面した楽観主義の原理に基づく新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-05-13T17:58:58Z)
ARCS: Accurate Rotation and Correspondence Search [21.01267270902429]
本論文は,「同時回転・対応探索」と呼ばれる,より汎用的な古いワフバ問題について述べる。まず最初に、例えば$m,napprox 106$ を 0.1$ 秒で解けるように、$O(mlog m)$ time と $O(m)$ space, iv) を用いる。
論文参考訳（メタデータ） (2022-03-28T04:42:11Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
本稿では,敵対的腐敗の存在下での線形文脈的包帯問題について検討する。逆汚染レベルに適応する分散認識アルゴリズムをC$で提案する。
論文参考訳（メタデータ） (2021-10-25T02:53:24Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
対数要因まで $(varepsilon,delta)$-differently private の最適過剰人口損失は $sqrtlog(d)/n + sqrtd/varepsilon n.$ です。損失関数がさらなる滑らかさの仮定を満たすとき、余剰損失は$sqrtlog(d)/n + (log(d)/varepsilon n)2/3で上界(対数因子まで)であることが示される。
論文参考訳（メタデータ） (2021-03-02T06:53:44Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Streaming Complexity of SVMs [110.63976030971106]
本稿では,ストリーミングモデルにおけるバイアス正規化SVM問題を解く際の空間複雑性について検討する。両方の問題に対して、$frac1lambdaepsilon$の次元に対して、$frac1lambdaepsilon$よりも空間的に小さいストリーミングアルゴリズムを得ることができることを示す。
論文参考訳（メタデータ） (2020-07-07T17:10:00Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。