Tight Regret Bounds for Fixed-Price Bilateral Trade
- URL: http://arxiv.org/abs/2504.04349v1
- Date: Sun, 06 Apr 2025 03:56:42 GMT
- Title: Tight Regret Bounds for Fixed-Price Bilateral Trade
- Authors: Houshuang Chen, Yaonan Jin, Pinyan Lu, Chihao Zhang,
- Abstract summary: We show a near-optimal $widetildeTheta(T2/3)$ tight bound for $textsfGlobal Budget Balance$ fixed-price mechanisms with two-bit/one-bit feedback.<n>For correlated/adversarial values, a near-optimal $Omega(T3/4)$ lower bound for $textsfGlobal Budget Balance$ fixed-price mechanisms with two-bit/one-bit feedback.
- Score: 10.237210183229555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetilde{\Theta}(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $\Omega(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $\Omega(T^{5/7})$ lower bound obtained in the work \cite{BCCF24} and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works \cite{CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24} (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{{fractal elimination}}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.
Related papers
- Contextual Bandits for Unbounded Context Distributions [12.299949253960477]
Nonparametric contextual bandit is an important model of sequential decision making problems.
We propose two nearest neighbor methods combined with UCB exploration.
Our analysis shows that this method achieves minimax optimal regret under a weak margin condition.
The second method achieves an expected regret of $tildeOleft(T1-frac(alpha+1)betaalpha+(d+2)beta+T1-betaright)$, in which $beta$ is a parameter describing the tail strength.
arXiv Detail & Related papers (2024-08-19T02:30:37Z) - Linear Contextual Bandits with Hybrid Payoff: Revisited [0.8287206589886881]
We study the Linear Contextual problem in the hybrid reward setting.
In this setting every arm's reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms.
arXiv Detail & Related papers (2024-06-14T15:41:21Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Federated Linear Bandits with Finite Adversarial Actions [20.1041278044797]
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem.
To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm.
We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
arXiv Detail & Related papers (2023-11-02T03:41:58Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$.
We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z) - Logarithmic Regret in Feature-based Dynamic Pricing [0.0]
Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products.
We provide two algorithms for infracigen and adversarial feature settings, and prove the optimal $O(dlogT)$ regret bounds for both.
We also prove an $(sqrtT)$ information-theoretic lower bound for a slightly more general setting, which demonstrates that "knowing-the-demand-curve" leads to an exponential improvement in feature-based dynamic pricing.
arXiv Detail & Related papers (2021-02-20T00:45:33Z) - Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive
Multi-Step Bootstrap [84.66885506098724]
This paper presents a new model-free algorithm for episodic finite-horizon Markov Decision Processes (MDP), Adaptive Multi-step Bootstrap (AMB)
We show AMB achieves a gap-dependent regret bound that only scales with the sum of the inverse of the sub-optimality gaps.
We also show AMB suffers an additional $frac|Z_mul|Delta_min$ regret, where $Z_mul$ is the set of state-action pairs $(s,a)$'s satisfying $a$ is a non-unique optimal action for
arXiv Detail & Related papers (2021-02-09T07:46:34Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z) - Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed
Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments.
We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.