Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond
- URL: http://arxiv.org/abs/2503.01701v1
- Date: Mon, 03 Mar 2025 16:13:45 GMT
- Title: Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond
- Authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti,
- Abstract summary: This paper introduces a general online learning framework to tackle regret minimization for piecewise linear rewards.<n>We show that our algorithm attains a regret of $widetildeO(sqrtnT)$, where $n$ is the number of pieces of the reward function and $T$ is the number of rounds.<n>Second, our algorithm demonstrates that, in the problem of learning to set prices in postedprice auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds.
- Score: 37.92189925462977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most microeconomic models of interest involve optimizing a piecewise linear function. These include contract design in hidden-action principal-agent problems, selling an item in posted-price auctions, and bidding in first-price auctions. When the relevant model parameters are unknown and determined by some (unknown) probability distributions, the problem becomes learning how to optimize an unknown and stochastic piecewise linear reward function. Such a problem is usually framed within an online learning framework, where the decision-maker (learner) seeks to minimize the regret of not knowing an optimal decision in hindsight. This paper introduces a general online learning framework that offers a unified approach to tackle regret minimization for piecewise linear rewards, under a suitable monotonicity assumption commonly satisfied by microeconomic models. We design a learning algorithm that attains a regret of $\widetilde{O}(\sqrt{nT})$, where $n$ is the number of ``pieces'' of the reward function and $T$ is the number of rounds. This result is tight when $n$ is \emph{small} relative to $T$, specifically when $n \leq T^{1/3}$. Our algorithm solves two open problems in the literature on learning in microeconomic settings. First, it shows that the $\widetilde{O}(T^{2/3})$ regret bound obtained by Zhu et al. [Zhu+23] for learning optimal linear contracts in hidden-action principal-agent problems is not tight when the number of agent's actions is small relative to $T$. Second, our algorithm demonstrates that, in the problem of learning to set prices in posted-price auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds, addressing an open problem posed by Cesa-Bianchi et al. [CBCP19].
Related papers
- Selling Joint Ads: A Regret Minimization Perspective [7.288063443108292]
Motivated by online retail, we consider the problem of selling one item (e.g., an ad slot) to two non-excludable buyers (say, a merchant and a brand)
This problem captures, for example, situations where a merchant and a brand bid cooperatively in an auction to advertise a product, and both benefit from the ad being shown.
A mechanism collects bids from the two and decides whether to allocate and which payments the two parties should make.
arXiv Detail & Related papers (2024-09-12T07:59:10Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - Dynamic Pricing and Advertising with Demand Learning [16.54088382906195]
We consider a novel pricing and advertising framework, where a seller not only sets product price but also designs flexible'advertising schemes'<n>We impose no structural restriction on the seller's feasible advertising strategies and allow her to advertise the product by disclosing or concealing any information.<n>Customers observe the advertising signal and infer a Bayesian belief over the products.
arXiv Detail & Related papers (2023-04-27T17:52:06Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Efficient Online-Bandit Strategies for Minimax Learning Problems [21.300877551771197]
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or minimization with non-standard aggregated losses.
More specifically, these problems are convex-linear problems where the learning is carried out over the model parameters $winmathcalW$ and over the empirical distribution $pinmathcalK$ of the training set.
To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm.
arXiv Detail & Related papers (2021-05-28T16:01:42Z) - Multinomial Logit Contextual Bandits: Provable Optimality and
Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown.
We propose upper confidence bound based algorithms for this MNL contextual bandit.
We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z) - Minimax Regret for Stochastic Shortest Path [63.45407095296692]
We study the Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost.
We show that the minimax regret for this setting is $widetilde O(B_star sqrt|S| |A| K)$ where $B_star$ is a bound on the expected cost of the optimal policy from any state.
Our algorithm runs in-time per episode, and is based on a novel reduction to reinforcement learning in finite-horizon MDPs.
arXiv Detail & Related papers (2021-03-24T10:11:49Z) - Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.