Learning for Edge-Weighted Online Bipartite Matching with Robustness
Guarantees
- URL: http://arxiv.org/abs/2306.00172v1
- Date: Wed, 31 May 2023 20:41:42 GMT
- Title: Learning for Edge-Weighted Online Bipartite Matching with Robustness
Guarantees
- Authors: Pengfei Li, Jianyi Yang, Shaolei Ren
- Abstract summary: We propose a novel approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR)
LOMAR achieves both good average-case and worst-case performance.
- Score: 28.737193318136725
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Many problems, such as online ad display, can be formulated as online
bipartite matching. The crucial challenge lies in the nature of
sequentially-revealed online item information, based on which we make
irreversible matching decisions at each step. While numerous expert online
algorithms have been proposed with bounded worst-case competitive ratios, they
may not offer satisfactory performance in average cases. On the other hand,
reinforcement learning (RL) has been applied to improve the average
performance, but it lacks robustness and can perform arbitrarily poorly. In
this paper, we propose a novel RL-based approach to edge-weighted online
bipartite matching with robustness guarantees (LOMAR), achieving both good
average-case and worst-case performance. The key novelty of LOMAR is a new
online switching operation which, based on a judicious condition to hedge
against future uncertainties, decides whether to follow the expert's decision
or the RL decision for each online item. We prove that for any $\rho\in[0,1]$,
LOMAR is $\rho$-competitive against any given expert online algorithm. To
improve the average performance, we train the RL policy by explicitly
considering the online switching operation. Finally, we run empirical
experiments to demonstrate the advantages of LOMAR compared to existing
baselines. Our code is available at: https://github.com/Ren-Research/LOMAR
Related papers
- Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations.
We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z) - Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe.
In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Learning-augmented Online Algorithm for Two-level Ski-rental Problem [8.381344684943212]
We study the two-level ski-rental problem,where a user needs to fulfill a sequence of demands for multiple items by choosing one of three payment options.
We develop a learning-augmented algorithm (LADTSR) by integrating Machine Learning predictions into the robust online algorithm.
arXiv Detail & Related papers (2024-02-09T16:10:54Z) - Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [42.47810044648846]
We consider a hybrid reinforcement learning setting (Hybrid RL) in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
We adapt the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q.
We show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks.
arXiv Detail & Related papers (2022-10-13T04:19:05Z) - Deep Policies for Online Bipartite Matching: A Reinforcement Learning
Approach [5.683591363967851]
We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data.
We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets.
arXiv Detail & Related papers (2021-09-21T18:04:19Z) - Fairness Maximization among Offline Agents in Online-Matching Markets [18.689388188794023]
Matching markets involve heterogeneous agents (typically from two parties) who are paired for mutual benefit.
There are two features distinguishing OMMs from traditional matching markets.
We propose online matching algorithms which optimize for either individual or group-level fairness.
arXiv Detail & Related papers (2021-09-18T13:41:42Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.