Fugu-MT 論文翻訳(概要): Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback

論文の概要: Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback

arxiv url: http://arxiv.org/abs/2605.28133v1
Date: Wed, 27 May 2026 08:20:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.884112
Title: Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback
Title（参考訳）: 動的値と集約フィードバックを持つ繰り返し第2価格オークションにおけるバイド学習
Authors: Benjamin Heymann, Otmane Sakhi,
Abstract要約: 入札者の価値が動的である場合、すなわち、現在の価値が過去の結果に依存する場合、入札者の価値が動的である場合、入札を学習する問題について研究する。我々は,プラグイン推定器と最適ポリシーの微分方程式を組み合わせた学習手法のクラスに対して,後悔すべき境界を導出する。
参考スコア（独自算出の注目度）: 5.65780894346598
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of learning to bid when the bidder's value is dynamic, i.e., when the current value depends on past outcomes. Specifically, we consider a bidder participating in repeated second-price auctions whose value depends on the time elapsed since their last successful bid, with auctions arriving in continuous time and only aggregated feedback revealed at the end of the horizon. Such a bidder must (1) balance the immediate benefit of winning the current auction against its impact on future values and (2) learn unknown environmental parameters. We derive regret bounds for a class of learning methods that combine plug-in estimators with a differential-equation characterization of the optimal policy, and show that a specific confidence bound algorithm learns the optimal policy with a near optimal regret of $\widetilde{O}(\log N)$ for piecewise linear primitives, and $\widetilde{O}(N^{1/3})$ for general, smooth primitives, achieving these regrets without explicit randomization. These theoretical results are supported by numerical experiments.
Abstract（参考訳）: 入札者の価値が動的である場合、すなわち、現在の価値が過去の結果に依存する場合、入札者の価値が動的である場合、入札を学習する問題について研究する。具体的には、最終入札から経過した時間に依存した第2価格の競売に繰り返し参加する入札者について検討し、連続した時間内に競売が行われ、地平線の終わりに得られたフィードバックのみを集計する。そのような入札者は、(1)現在の競売に勝つという直接的な利益と、その将来の価値に対する影響と、(2)未知の環境パラメーターのバランスを取らなければならない。我々は、プラグイン推定器と最適ポリシーの微分方程式的特徴を組み合わせた学習方法のクラスに対する後悔のバウンダリを導出し、特定の信頼のバウンダリアルゴリズムが、任意のリニアプリミティブに対して$\widetilde{O}(\log N)$と$\widetilde{O}(N^{1/3})$のほぼ最適なリコールで最適ポリシーを学ぶことを示す。これらの理論結果は数値実験によって支持される。

論文の概要: Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback

関連論文リスト