Fugu-MT 論文翻訳(概要): Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits

論文の概要: Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits

arxiv url: http://arxiv.org/abs/2411.04054v1
Date: Wed, 06 Nov 2024 16:59:11 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:45.04361
Title: Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits
Title（参考訳）: 部分構造発見は因果帯域における非回帰学習に十分である
Authors: Muhammad Qasim Elahi, Mahsa Ghasemi, Murat Kocaoglu,
Abstract要約: 現在の研究はしばしば因果グラフが知られていると仮定するが、これは必ずしも先入観として利用できるとは限らない。我々は、根底にある因果グラフが不明で、潜伏する共同設立者を含むシナリオにおける因果帯域の問題に焦点を当てる。われわれは、必要で十分な潜在的共同創設者の集合を公式に特徴付け、可能な限り最適な武器が正しく特定されるように検出または学習する必要がある。
参考スコア（独自算出の注目度）: 7.064432289838905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Causal knowledge about the relationships among decision variables and a reward variable in a bandit setting can accelerate the learning of an optimal decision. Current works often assume the causal graph is known, which may not always be available a priori. Motivated by this challenge, we focus on the causal bandit problem in scenarios where the underlying causal graph is unknown and may include latent confounders. While intervention on the parents of the reward node is optimal in the absence of latent confounders, this is not necessarily the case in general. Instead, one must consider a set of possibly optimal arms/interventions, each being a special subset of the ancestors of the reward node, making causal discovery beyond the parents of the reward node essential. For regret minimization, we identify that discovering the full causal structure is unnecessary; however, no existing work provides the necessary and sufficient components of the causal graph. We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly. We also propose a randomized algorithm for learning the causal graph with a limited number of samples, providing a sample complexity guarantee for any desired confidence level. In the causal bandit setup, we propose a two-stage approach. In the first stage, we learn the induced subgraph on ancestors of the reward, along with a necessary and sufficient subset of latent confounders, to construct the set of possibly optimal arms. The regret incurred during this phase scales polynomially with respect to the number of nodes in the causal graph. The second phase involves the application of a standard bandit algorithm, such as the UCB algorithm. We also establish a regret bound for our two-phase approach, which is sublinear in the number of rounds.
Abstract（参考訳）: 帯域設定における決定変数と報酬変数の関係に関する因果知識は、最適決定の学習を加速させる。現在の研究はしばしば因果グラフが知られていると仮定するが、これは必ずしも先入観として利用できるとは限らない。この課題に動機づけられた私たちは、根底にある因果グラフが不明で、潜伏した共同設立者を含む可能性のあるシナリオにおける因果帯域問題に焦点を当てる。報酬ノードの親への介入は、潜伏した共同創設者がいない場合に最適であるが、一般的には必ずしもそうではない。代わりに、報酬ノードの祖先の特別なサブセットであり、報酬ノードの親以上の因果発見を必須とする、おそらく最適な武器/介入のセットを考える必要がある。後悔の最小化のために、完全な因果構造を見つけることは不要であると確認するが、因果グラフの必要十分成分を提供する既存の研究は存在しない。われわれは、必要で十分な潜在的共同創設者の集合を公式に特徴付け、可能な限り最適な武器が正しく特定されるように検出または学習する必要がある。また,限られたサンプル数で因果グラフを学習するためのランダム化アルゴリズムを提案する。本稿では2段階のアプローチを提案する。最初の段階では、報酬の祖先に関する誘導されたサブグラフと、潜在的共同設立者の必要かつ十分なサブセットを学び、おそらく最適な武器のセットを構築します。このフェーズで生じた後悔は、因果グラフのノード数に関して多項式的にスケールする。第2フェーズは、UTBアルゴリズムのような標準帯域幅アルゴリズムの適用を含む。また、ラウンド数のサブ線形である2相アプローチに対する後悔の束縛も確立する。

関連論文リスト

The Minimal Search Space for Conditional Causal Bandits [0.18124328823188351]
因果知識は意思決定問題を支援するのに使える。本稿では、最適条件介入を含むことが保証される最小限のノードのグラフィカルな特徴について述べる。次に、この最小のノード群を特定するために、O(|V| + |E|)$の時間複雑性を持つ効率的なアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-02-10T15:45:18Z)
Causal bandits with backdoor adjustment on unknown Gaussian DAGs [5.807183284468881]
グラフ構造が不明な場合の因果帯域問題について検討する。連続的に生成された実験データと観測データを用いて各アームのバックドア調整セットを同定する。最適介入を逐次決定するために,修正された上位信頼境界に基づく新しい帯域幅アルゴリズムを開発した。
論文参考訳（メタデータ） (2025-02-04T05:18:35Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Banditsは、安静と安静のバンディットを一般化するフレームワークである。本研究は,2種類の単調包帯に焦点をあてる: 立ち上がり, 腕の期待される報酬が増加する, 引き金の数が増える, 回転する, 反対の行動が起こる。
論文参考訳（メタデータ） (2024-09-09T18:23:07Z)
Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [28.94461817548213]
条件付き良性環境と任意の環境下での学習性能におけるトレードオフの可能性について,上界と下界の整合性を証明した。この問題を線形バンディット設定に還元することで、最初に因果バンディットのインスタンス依存境界を求める。
論文参考訳（メタデータ） (2024-07-01T04:12:15Z)
Additive Causal Bandits with Unknown Graph [10.575089475850465]
我々は,学習者が因果グラフに関連付けられたランダムな変数の集合に介入することを選択可能な因果帯域設定における行動を選択するアルゴリズムを探索する。学習者の目標は、観測可能な変数に対するすべての介入の中で、結果変数の期待を最大化する介入を素早く見つけることである。
論文参考訳（メタデータ） (2023-06-13T15:43:04Z)
Causal Bandits without Graph Learning [28.021500949026766]
我々は,原子間干渉を用いた報酬ノードの親ノード探索のための効率的なアルゴリズムを開発した。報奨ノードが複数の親を持つ場合にアルゴリズムを拡張します。
論文参考訳（メタデータ） (2023-01-26T20:27:14Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
本稿では,因果図形モデルにおける最適な介入順序を設計する問題について検討する。グラフの構造は知られており、ノードは$N$である。頻繁性(UCBベース)とベイズ的設定に2つのアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-08-26T16:21:31Z)
Pure Exploration of Causal Bandits [9.77519365079468]
因果バンディット問題は多腕バンディットと因果推論を統合する。オンライン学習課題:未知の因果推論分布を持つ因果グラフを与えられた場合、1つの変数に介入するか、介入しないかを選択できる。 3種類の因果モデルに対して、第一のギャップ依存完全適応純粋探索アルゴリズムを提供する。
論文参考訳（メタデータ） (2022-06-16T02:19:37Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
本稿では,ギャップベースアルゴリズムと逐次除去に基づく2つのアルゴリズムを提案する。具体的には、ギャップベースのアルゴリズムでは、サンプルの複雑さは定数要素まで最適であり、連続的な除去では対数因子まで最適である。
論文参考訳（メタデータ） (2021-11-14T21:49:58Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
トンプソンサンプリングやその他のシーケンシャルな意思決定アルゴリズムは、文脈的包帯における探索と探索のトレードオフに取り組むための一般的なアプローチである。性能は不特定な事前条件で優雅に低下することを示す。
論文参考訳（メタデータ） (2021-07-03T23:17:26Z)
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme [16.1694012177079]
我々は,多腕バンディットにおける最高の武器識別問題,潜在的に私的な報酬について検討する。ゴールは、固定された所定のレベルで、最も高い定量値を持つ腕を特定することである。このアルゴリズムは$delta$-PACであり,サンプルの複雑さを特徴付ける。
論文参考訳（メタデータ） (2020-06-11T20:23:43Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
インセンティブ付き探索(Incentivized Exploring)は、武器の選択を自給自足エージェントによって制御するマルチアーム・バンディットのバージョンである。我々は、インセンティブの価格に焦点を合わせ、インセンティブの適合性のために、広く解釈された、パフォーマンスの喪失が引き起こされる。
論文参考訳（メタデータ） (2020-02-03T04:58:51Z)
Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting [81.70513857417106]
CascadeBAIを設計し、分析する。これは、$K$アイテムのベストセットを見つけるアルゴリズムである。 CascadeBAIの時間的複雑さの上限は、決定的な分析課題を克服することによって導かれる。その結果,カスケードBAIの性能は,時間的複雑性の低い境界の導出により,いくつかの実践的状況において最適であることが示唆された。
論文参考訳（メタデータ） (2020-01-23T16:47:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。