Learning-Based Pricing and Matching for Two-Sided Queues
- URL: http://arxiv.org/abs/2403.11093v2
- Date: Mon, 18 Nov 2024 17:58:35 GMT
- Title: Learning-Based Pricing and Matching for Two-Sided Queues
- Authors: Zixian Yang, Lei Ying,
- Abstract summary: We consider a dynamic system with multiple types of customers and servers.
The arrival rate of each queue is determined by the price according to unknown demand or supply functions.
This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers.
- Score: 14.854050478428727
- License:
- Abstract: We consider a dynamic system with multiple types of customers and servers. Each type of waiting customer or server joins a separate queue, forming a bipartite graph with customer-side queues and server-side queues. The platform can match the servers and customers if their types are compatible. The matched pairs then leave the system. The platform will charge a customer a price according to their type when they arrive and will pay a server a price according to their type. The arrival rate of each queue is determined by the price according to some unknown demand or supply functions. Our goal is to design pricing and matching algorithms to maximize the profit of the platform with unknown demand and supply functions, while keeping queue lengths of both customers and servers below a predetermined threshold. This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers. The difficulties of the problem include simultaneous learning and decision making, and the tradeoff between maximizing profit and minimizing queue length. We use a longest-queue-first matching algorithm and propose a learning-based pricing algorithm, which combines gradient-free stochastic projected gradient ascent with bisection search. We prove that our proposed algorithm yields a sublinear regret $\tilde{O}(T^{5/6})$ and anytime queue-length bound $\tilde{O}(T^{1/6})$, where $T$ is the time horizon. We further establish a tradeoff between the regret bound and the queue-length bound: $\tilde{O}(T^{1-\gamma})$ versus $\tilde{O}(T^{\gamma})$ for $\gamma \in (0, 1/6].$
Related papers
- Queueing Matching Bandits with Preference Feedback [10.988222071035198]
We consider multi-class asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side.
The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function.
We propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound to $O(minN,K/epsilon)$ for a large time horizon.
arXiv Detail & Related papers (2024-10-14T02:29:06Z) - Online Linear Programming with Batching [18.989352151219336]
We study Online Linear Programming (OLP) withOmega.
We show that the ability to delay decisions brings better operational performance, as measured by regret.
All the proposed algorithms delay decisions on customers arriving in only the first and the last batch.
arXiv Detail & Related papers (2024-08-01T06:13:24Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Timely Asynchronous Hierarchical Federated Learning: Age of Convergence [59.96266198512243]
We consider an asynchronous hierarchical federated learning setting with a client-edge-cloud framework.
The clients exchange the trained parameters with their corresponding edge servers, which update the locally aggregated model.
The goal of each client is to converge to the global model, while maintaining timeliness of the clients.
arXiv Detail & Related papers (2023-06-21T17:39:16Z) - Queue Scheduling with Adversarial Bandit Learning [20.606599586119835]
We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates.
Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization.
We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
arXiv Detail & Related papers (2023-03-03T07:17:09Z) - Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches.
We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes.
Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z) - Learning While Scheduling in Multi-Server Systems with Unknown
Statistics: MaxWeight with Discounted UCB [18.898514227870926]
This paper considers a multi-server system with multiple servers and multiple types of jobs, where different job types require different amounts of processing time at different servers.
The goal is to schedule jobs on servers without knowing the statistics of the processing times.
We propose a new algorithm, which combines the MaxWeight scheduling policy with discounted upper confidence bound (UCB) to simultaneously learn statistics and schedule jobs to servers.
arXiv Detail & Related papers (2022-09-02T15:37:02Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z) - Matched Queues with Matching Batch Pair (m, n) [8.02548118934175]
We show that this matched queue can be expressed as a novel bidirectional level-dependent quasi-birth-and-death process.
We provide a detailed analysis for this matched queue, including the system stability, the average stationary queue lengthes, and the average sojourn time of any A- customer or B- customer.
arXiv Detail & Related papers (2020-09-06T14:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.