Fugu-MT 論文翻訳(概要): What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

論文の概要: What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

arxiv url: http://arxiv.org/abs/2603.09532v1
Date: Tue, 10 Mar 2026 11:40:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.257857
Title: What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects
Title（参考訳）: 非準拠バンドで何が重要か?BRACE:推奨バンド、留意バンド、認定バンド
Authors: Nicolás Della Penna,
Abstract要約: 下流アクターが私的情報を利用する場合、リコメンデーション福祉は学習者評価可能な治療方針を厳格に超えることができることを示す。有限コンテキスト2乗IV問題に対して,パラメータフリー位相共役アルゴリズムBRACEを提案する。我々はこの理論を、直接制御にまたがる有限コンテキスト経験的ベンチマークで補完し、現在の未来的トレードオフ、弱い識別、均一性障害、長方形の過剰識別を媒介する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bandits with noncompliance separate the learner's recommendation from the treatment actually delivered, so the learning target itself must be chosen. A platform may care about recommendation welfare in the current mediated workflow, treatment learning for a future direct-control regime, or anytime-valid uncertainty for one of those targets. These objectives need not agree. We formalize this objective-choice problem, identify the direct-control regime in which recommendation and treatment objectives collapse, and show by example that recommendation welfare can strictly exceed every learner-measurable treatment policy when downstream actors use private information. For finite-context square-IV problems we propose BRACE, a parameter-free phase-doubling algorithm that performs IV inversion only after matrix certification and otherwise returns full-range but honest structural intervals. BRACE delivers simultaneous policy-value validity, fixed-gap identification of the operationally optimal recommendation policy, and fixed-gap identification of the structurally optimal treatment policy under contextual homogeneity and invertibility. We complement the theory with a finite-context empirical benchmark spanning direct control, mediated present-versus-future tradeoffs, weak identification, homogeneity failure, and rectangular overidentification. The experiments show that safety appears as regret on easy problems, as abstention and wide valid intervals under weak identification, as a reason to prefer recommendation welfare under homogeneity failure, and as tighter structural uncertainty when extra instruments are available. For rich contexts, we also derive an orthogonal score whose conditional bias factorizes into compliance-model and outcome-model errors, clarifying what must be stabilized for anytime-valid semiparametric IV inference.
Abstract（参考訳）: 非準拠のバンドは、実際に提供された治療から学習者の推薦を分離するので、学習対象自体を選択する必要がある。プラットフォームは、現在介在しているワークフローにおけるリコメンデーションの福祉、将来の直接管理体制のための治療学習、あるいはそれらの目標の1つに対する時効的な不確実性に気を配るかもしれない。これらの目的には同意する必要はない。我々は、この客観的選択問題を定式化し、勧告と治療目的が崩壊する直接制御体制を特定し、例えば、下流アクターがプライベート情報を使用する場合、推奨福祉が学習者計測可能な治療方針を厳格に超えることを示す。有限コンテキスト2乗IV問題に対して,パラメータフリー位相共役アルゴリズムBRACEを提案する。 BRACEは、コンテキスト的均一性と可逆性の下で、同時にポリシー値の妥当性、運用上最適な推奨ポリシーの固定ギャップ識別、構造上最適な治療ポリシーの固定ギャップ識別を提供する。我々はこの理論を、直接制御にまたがる有限コンテキスト経験的ベンチマークで補完し、現在の未来的トレードオフ、弱い識別、均一性障害、および長方形の過剰識別を行う。実験の結果、安全は容易な問題に対する後悔として現れ、弱識別下での棄却と広範囲な有効間隔、均質性障害下での推奨福祉を優先する理由、余分な機器が利用可能である場合の構造的不確実性として現れることがわかった。リッチな文脈では、条件バイアスがコンプライアンスモデルと結果モデルエラーに分解される直交スコアを導出し、任意の時間価半パラメトリックIV推論に対して安定化すべきものを明らかにする。

論文の概要: What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

関連論文リスト