Related papers: Learning to Price Against a Moving Target

Learning to Price Against a Moving Target

URL: http://arxiv.org/abs/2106.04689v1
Date: Tue, 8 Jun 2021 20:57:11 GMT
Title: Learning to Price Against a Moving Target
Authors: Renato Paes Leme, Balasubramanian Sivan, Yifeng Teng, Pratik Worah
Abstract summary: We study the problem where the buyer's value is a moving target, i.e., they change over time. In either case, we provide upper and lower bounds on the optimal revenue loss.
Score: 23.085429420254787
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the Learning to Price setting, a seller posts prices over time with the goal of maximizing revenue while learning the buyer's valuation. This problem is very well understood when values are stationary (fixed or iid). Here we study the problem where the buyer's value is a moving target, i.e., they change over time either by a stochastic process or adversarially with bounded variation. In either case, we provide matching upper and lower bounds on the optimal revenue loss. Since the target is moving, any information learned soon becomes out-dated, which forces the algorithms to keep switching between exploring and exploiting phases.

Related papers

A Contextual Online Learning Theory of Brokerage [8.049531918823758]
We study the role of contextual information in the online learning problem of brokerage between traders. We show that if the bounded density assumption is lifted, then the problem becomes unlearnable.
arXiv Detail & Related papers (2024-05-22T18:38:05Z)
Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z)
An Online Learning Theory of Brokerage [3.8059763597999012]
We investigate brokerage between traders from an online learning perspective. Unlike other bilateral trade problems already studied, we focus on the case where there are no designated buyer and seller roles. We show that the optimal rate degrades to $sqrtT$ in the first case, and the problem becomes unlearnable in the second.
arXiv Detail & Related papers (2023-10-18T17:01:32Z)
Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned. We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
arXiv Detail & Related papers (2022-10-21T22:27:07Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Learning Equilibria in Matching Markets from Bandit Feedback [139.29934476625488]
We develop a framework and algorithms for learning stable market outcomes under uncertainty. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
arXiv Detail & Related papers (2021-08-19T17:59:28Z)
Fast Rate Learning in Stochastic First Price Bidding [0.0]
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. We show how to achieve significantly lower regret when the opponents' maximal bid distribution is known. Our algorithms converge much faster than alternatives proposed in the literature for various bid distributions.
arXiv Detail & Related papers (2021-07-05T07:48:52Z)
Correcting Momentum in Temporal Difference Learning [95.62766731469671]
We argue that momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale. We show that this phenomenon exists, and then propose a first-order correction term to momentum. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
arXiv Detail & Related papers (2021-06-07T20:41:15Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms [81.01917016753644]
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory. Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
arXiv Detail & Related papers (2020-10-02T15:51:48Z)
Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions [13.234975857626752]
We consider the problem of robust learning of reserve prices against strategic buyers in contextual second-price auctions. We propose learning policies that are robust to such strategic behavior.
arXiv Detail & Related papers (2020-02-25T19:00:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.