Fugu-MT 論文翻訳(概要): Learning to Bid in Contextual First Price Auctions

論文の概要: Learning to Bid in Contextual First Price Auctions

arxiv url: http://arxiv.org/abs/2109.03173v1
Date: Tue, 7 Sep 2021 16:11:18 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-08 14:26:59.242308
Title: Learning to Bid in Contextual First Price Auctions
Title（参考訳）: コンテクストファーストプライスオークションの入札方法を学ぶ
Authors: Ashwinkumar Badanidiyuru and Zhe Feng and Guru Guruganesh
Abstract要約: 我々は、最初の価格オークションに何度も入札する1人の入札者(受講者)について検討する。 2進フィードバックに対して,最大最大推定(MLE)法を用いて,最大$widetildeO(sqrtlog(d) T)$ regretを実現する入札アルゴリズムを提案する。また、幅広いクラスの入札ポリシーは、少なくとも$Omega(sqrtT)$を後悔しなければなりません。
参考スコア（独自算出の注目度）: 6.482311591019749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time $t$, the learner observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on historical information and $x_t$. We assume a structured linear model of the maximum bid of all the others $m_t = \alpha_0\cdot x_t + z_t$, where $\alpha_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly sampled from a noise distribution $\mathcal{F}$ with log-concave density function $f$. We consider both \emph{binary feedback} (the learner can only observe whether she wins or not) and \emph{full information feedback} (the learner can observe $m_t$) at the end of each time $t$. For binary feedback, when the noise distribution $\mathcal{F}$ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $\widetilde{O}(\sqrt{\log(d) T})$ regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with \emph{unknown} noise distribution, we provide an algorithm that achieves regret at most $\widetilde{O}(\sqrt{dT})$. Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution $\mathcal{F}$ and linear weight $\alpha_0$ simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $\Omega(\sqrt{T})$, even when the learner receives the full information feedback and $\mathcal{F}$ is known.
Abstract（参考訳）: 本稿では,コンテクストプライスオークションを繰り返す際の入札問題について検討する。 1人の入札者(learner)は、最初の価格オークションで繰り返し入札を行う。 $t$ のたびに、学習者はコンテキスト $x_t\in \mathbb{r}^d$ を観察し、履歴情報と$x_t$ に基づいて入札を決定する。 m_t = \alpha_0\cdot x_t + z_t$, where $\alpha_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly sampleed from a noise distribution $\mathcal{F}$ with log-concave density function $f$。学習者は、学習者が勝利するかどうかのみを観察できる)と、学習者(学習者)は、各時点の最後に$t$で、_emph{full information feedback}($m_t$を観測できる)の両方を考える。二元フィードバックのために、$\mathcal{f}$ のノイズ分布が知られているとき、最大確率推定 (mle) 法を用いて最大$\widetilde{o}(\sqrt{\log(d) t})$ regret を達成するための入札アルゴリズムを提案する。さらに,このアルゴリズムを二元フィードバックによる設定に一般化し,ノイズ分布は未知であるがパラメータ化された分布に属する。 emph{unknown}ノイズ分布を持つ全情報フィードバックのために、最大$\widetilde{o}(\sqrt{dt})$で後悔を実現するアルゴリズムを提供する。提案手法では, 対数凹密度関数の推定器とMLE法を組み合わせて, 雑音分布$\mathcal{F}$と線形重み$\alpha_0$を同時に学習する。また、幅広いクラスにおける入札ポリシーは、学習者が完全な情報フィードバックを受け取り、$\mathcal{f}$が知られている場合でも、少なくとも$\omega(\sqrt{t})$を後悔しなければならないという低限の結果を提供する。

関連論文リスト

Corruption-Robust Lipschitz Contextual Search [2.296475290901356]
リプシッツ関数を劣化したバイナリ信号で学習する問題について研究する。本研究は,新しいアルゴリズム手法であるエマフォノスティックチェックと,新しい解析手法を紹介する。
論文参考訳（メタデータ） (2023-07-26T02:02:19Z)
A Spectral Approach to Item Response Theory [6.5268245109828005]
本稿では,Raschモデルに対する新しい項目推定アルゴリズムを提案する。我々のアルゴリズムの中核は、アイテム-イムグラフ上で定義されたマルコフ連鎖の定常分布の計算である。合成および実生活データセットの実験により、我々のアルゴリズムは、文献でよく使われている手法とスケーラブルで正確で競合することを示した。
論文参考訳（メタデータ） (2022-10-09T18:57:08Z)
Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
マスアートノイズの存在下でのPAC学習ハーフスペースの複雑さについて検討した。我々は,最適0-1誤差が小さい場合でも,リアルタイムのMassartハーフスペース学習者が$Omega(eta)$よりも良い誤差を得られることを示す。
論文参考訳（メタデータ） (2022-07-28T17:50:53Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
既知の滑らかな関数 $f$ の値を未知の点 $boldsymbolmu in mathbbRn$ で推定する問題について検討する。我々は、各座標の重要性に応じてサンプルを学習するインスタンス適応アルゴリズムを設計し、少なくとも1-delta$の確率で$epsilon$の正確な推定値である$f(boldsymbolmu)$を返す。
論文参考訳（メタデータ） (2022-03-18T18:50:52Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
ガウス境界の下でのマスアートノイズ付きmathbbRd$におけるPAC学習ハーフスペースの問題について検討する。この結果は,Massartモデルにおける学習ハーフスペースの複雑さを定性的に特徴づけるものである。
論文参考訳（メタデータ） (2021-08-19T16:16:48Z)
Gap-Dependent Unsupervised Exploration for Reinforcement Learning [40.990467706237396]
タスクに依存しない強化学習のための効率的なアルゴリズムを提案する。このアルゴリズムは1/epsilon cdot (H3SA / rho + H4 S2 A) の$widetildemathcalOのみを探索する。情報理論上、この境界は$rho Theta (1/(HS))$と$H>1$に対してほぼ厳密であることを示す。
論文参考訳（メタデータ） (2021-08-11T20:42:46Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
本稿では,ランダム化最小二乗値反復(RLSVI)アルゴリズムに着想を得たモデルレス強化学習アルゴリズムを提案する。提案アルゴリズムは,スカラーノイズを用いたトレーニングデータを簡易に摂動させることにより,探索を促進する。我々はこの理論を、既知の困難な探査課題にまたがる実証的な評価で補完する。
論文参考訳（メタデータ） (2021-06-15T02:23:07Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
S$状態、$A$アクション、割引係数$gamma in (0,1)$、近似しきい値$epsilon > 0$の MDP が与えられた場合、$epsilon$-Optimal Policy を学ぶためのモデルなしアルゴリズムを提供する。十分小さな$epsilon$の場合、サンプルの複雑さで改良されたアルゴリズムを示す。
論文参考訳（メタデータ） (2020-06-06T13:34:41Z)
Optimal Contextual Pricing and Extensions [32.152826900874075]
このアルゴリズムは$Omega(d log T)$ lower bound up to the $d log d$ additive factor。これらのアルゴリズムは、凸領域のスタイナー固有値を様々なスケールで有界化する新しい手法に基づいている。
論文参考訳（メタデータ） (2020-03-03T18:46:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。