Fugu-MT 論文翻訳(概要): Robust Tests in Online Decision-Making

論文の概要: Robust Tests in Online Decision-Making

arxiv url: http://arxiv.org/abs/2208.09819v1
Date: Sun, 21 Aug 2022 06:50:45 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-23 12:36:07.198133
Title: Robust Tests in Online Decision-Making
Title（参考訳）: オンライン意思決定におけるロバストテスト
Authors: Gi-Soo Kim, Hyun-Joon Yang, Jane P. Kim
Abstract要約: バンドアルゴリズムは累積報酬を最大化するために逐次決定問題に広く用いられている。モバイルヘルスにおける目標は、ウェアラブルデバイスを介して取得したユーザ固有の情報に基づいて、個人化された介入を通じてユーザの健康を促進することである。本稿では,アクターパラメータに対する新しいテスト手順を導出し,不特定性を批判しやすいアクター批判アルゴリズムを提案する。
参考スコア（独自算出の注目度）: 3.867363075280544
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Bandit algorithms are widely used in sequential decision problems to maximize the cumulative reward. One potential application is mobile health, where the goal is to promote the user's health through personalized interventions based on user specific information acquired through wearable devices. Important considerations include the type of, and frequency with which data is collected (e.g. GPS, or continuous monitoring), as such factors can severely impact app performance and users' adherence. In order to balance the need to collect data that is useful with the constraint of impacting app performance, one needs to be able to assess the usefulness of variables. Bandit feedback data are sequentially correlated, so traditional testing procedures developed for independent data cannot apply. Recently, a statistical testing procedure was developed for the actor-critic bandit algorithm. An actor-critic algorithm maintains two separate models, one for the actor, the action selection policy, and the other for the critic, the reward model. The performance of the algorithm as well as the validity of the test are guaranteed only when the critic model is correctly specified. However, misspecification is frequent in practice due to incorrect functional form or missing covariates. In this work, we propose a modified actor-critic algorithm which is robust to critic misspecification and derive a novel testing procedure for the actor parameters in this case.
Abstract（参考訳）: バンドアルゴリズムは累積報酬を最大化するために逐次決定問題に広く用いられている。モバイルヘルスは、ウェアラブルデバイスから取得したユーザ固有の情報に基づいて、パーソナライズされた介入を通じてユーザーの健康を促進することを目的としている。重要な考慮事項としては、データ収集のタイプ、頻度(GPSや継続的監視など)などが挙げられる。アプリケーションのパフォーマンスに影響を与えるという制約で有用なデータを収集する必要性のバランスをとるためには、変数の有用性を評価する必要がある。バンディットフィードバックデータは順次相関するので、独立したデータのために開発された従来のテスト手順は適用できない。近年,actor-critic banditアルゴリズムのための統計的テスト手法が開発されている。アクタ-クリティックアルゴリズムは、アクタのための1つ、アクション選択ポリシー、批評家のためのもう1つ、報酬モデルという2つの異なるモデルを維持する。評価モデルが正しく特定された場合にのみ、アルゴリズムの性能とテストの有効性が保証される。しかし、不正確な機能形態や共変量の欠如により、実際には誤特定が頻繁に発生する。本研究では,不特定化を批判し,この場合のアクタパラメータの新しいテスト手順を導出するために頑健な修正アクタ-クリティックアルゴリズムを提案する。

関連論文リスト

Probably Approximately Precision and Recall Learning [62.912015491907994]
精度とリコールは機械学習の基本的な指標である。一方的なフィードバック – トレーニング中にのみ肯定的な例が観察される – は,多くの実践的な問題に固有のものだ。 PAC学習フレームワークでは,各仮説をグラフで表現し,エッジは肯定的な相互作用を示す。
論文参考訳（メタデータ） (2024-11-20T04:21:07Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
本稿では,テストサンプルの難易度に適応して,統計的リスクの近似的条件制御を実現する手法を提案する。我々のフレームワークは、ユーザが提供するコンディショニングイベントに基づく従来のコンディショニングリスク制御を超えて、コンディショニングに適した関数クラスのアルゴリズム的、データ駆動決定を行う。
論文参考訳（メタデータ） (2024-06-25T08:29:32Z)
Outlier-Insensitive Kalman Filtering: Theory and Applications [26.889182816155838]
本稿では,リニアカルマンフィルタの標準更新ステップの短い反復処理しか必要とせず,アウトリーチの有害な影響を軽減できるパラメータフリーアルゴリズムを提案する。
論文参考訳（メタデータ） (2023-09-18T06:33:28Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
本稿では,UAPに対して正常サンプルと逆サンプルの異なる応答を誘導する,データに依存しない逆検出フレームワークを提案する。実験結果から,本手法は様々なテキスト分類タスクにおいて,競合検出性能を実現することが示された。
論文参考訳（メタデータ） (2023-06-27T02:54:07Z)
Sequential Kernelized Independence Testing [101.22966794822084]
我々は、カーネル化依存度にインスパイアされたシーケンシャルなカーネル化独立試験を設計する。シミュレーションデータと実データの両方にアプローチのパワーを実証する。
論文参考訳（メタデータ） (2022-12-14T18:08:42Z)
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries [53.222218035435006]
私たちは、差別的で多様なクエリを最適化するために、逆ツールを使用します。我々の改善は既存の方法よりもはるかに正確な会員推定を実現している。
論文参考訳（メタデータ） (2022-10-19T17:46:50Z)
Least Square Calibration for Peer Review [18.063450032460047]
ピアレーティングから上位候補を選択するためのフレキシブルなフレームワーク、すなわち最小二乗キャリブレーション(LSC)を提案する。本フレームワークは、軽度仮定の下で、ノイズのない線形スコアリング関数の完全校正を確実に行う。我々のアルゴリズムは、最高評価値に基づいて上位論文を選択するベースラインを一貫して上回る。
論文参考訳（メタデータ） (2021-10-25T02:40:33Z)
Learning User Preferences in Non-Stationary Environments [42.785926822853746]
オンラインノンステーショナリーレコメンデーションシステムのための新しいモデルを紹介します。好みが変化しない場合でも,我々のアルゴリズムが他の静的アルゴリズムよりも優れていることを示す。
論文参考訳（メタデータ） (2021-01-29T10:26:16Z)
Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
本稿では,差分プライバシー制約下での校正問題の性質を抽象化する枠組みを提案する。また、新しいリカレーションアルゴリズム、精度温度スケーリングを設計し、プライベートデータセットの事前処理より優れています。
論文参考訳（メタデータ） (2020-08-21T18:43:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。