Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
- URL: http://arxiv.org/abs/2510.14097v1
- Date: Wed, 15 Oct 2025 21:09:41 GMT
- Title: Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
- Authors: Zixian Yang, Sushil Mahavir Varma, Lei Ying,
- Abstract summary: We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues.<n>A compatible customer-server pair can then be matched by the platform, at which point, they leave the system.<n>Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths.
- Score: 11.51661878156199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-\gamma})$ regret, $\tilde{O}(T^{\gamma/2})$ average queue length, and $\tilde{O}(T^{\gamma})$ maximum queue length for $\gamma \in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $\gamma$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.
Related papers
- Queue Length Regret Bounds for Contextual Queueing Bandits [0.8984888893275712]
We introduce contextual queueing bandits, a new context-aware framework for scheduling while simultaneously learning unknown service rates.<n>We show that our algorithm, CQB-$varepsilon$, achieves a regret upper bound of $widetildemathcalO(T-1/4)$.<n>We also consider the setting of adversarially chosen contexts, for which our second algorithm, CQB-Opt, achieves a regret upper bound of $mathcalO(log2 T)$.
arXiv Detail & Related papers (2026-01-27T07:40:23Z) - Analyzing homogenous and heterogeneous multi-server queues via neural networks [0.0]
We use a machine learning approach to predict the stationary distributions of the number of customers in a single-staiton multi server system.<n>We are the only ones to predict the stationary distribution of the number of customers in the system in a $GI/GI_i/2$ queue.
arXiv Detail & Related papers (2025-04-01T12:30:18Z) - Learning payoffs while routing in skill-based queues [0.4077787659104315]
We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff.<n>We show that the algorithm is optimal up to logarithmic terms by deriving a regret lower bound.
arXiv Detail & Related papers (2024-12-13T14:33:50Z) - Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target.<n>In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions.<n>The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an horizon regime characterized by a large target number of facilities.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Learning-Based Pricing and Matching for Two-Sided Queues [14.854050478428727]
We consider a dynamic system with multiple types of customers and servers.
The arrival rate of each queue is determined by the price according to unknown demand or supply functions.
This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers.
arXiv Detail & Related papers (2024-03-17T05:07:04Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - MNL-Bandits under Inventory and Limited Switches Constraints [38.960764902819434]
We develop an efficient UCB-like algorithm to optimize the assortments while learning customers' choices from data.
We prove that our algorithm can achieve a sub-linear regret bound $tildeOleft(T1-alpha/2right)$ if $O(Talpha)$ switches are allowed.
arXiv Detail & Related papers (2022-04-22T16:02:27Z) - Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias.
We formalize the personalized collaborative learning problem as optimization of a user's objective.
We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z) - Policy Optimization Using Semiparametric Models for Dynamic Pricing [1.3428344011390776]
We study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise.
We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making.
arXiv Detail & Related papers (2021-09-13T23:50:01Z) - Provably Efficient Generative Adversarial Imitation Learning for Online
and Offline Setting with Linear Function Approximation [81.0955457177017]
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain reward set.
We study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps.
arXiv Detail & Related papers (2021-08-19T16:16:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.