Related papers: Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees

Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees

URL: http://arxiv.org/abs/2306.00172v1
Date: Wed, 31 May 2023 20:41:42 GMT
Title: Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Authors: Pengfei Li, Jianyi Yang, Shaolei Ren
Abstract summary: We propose a novel approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR) LOMAR achieves both good average-case and worst-case performance.
Score: 28.737193318136725
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robustness and can perform arbitrarily poorly. In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judicious condition to hedge against future uncertainties, decides whether to follow the expert's decision or the RL decision for each online item. We prove that for any $\rho\in[0,1]$, LOMAR is $\rho$-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines. Our code is available at: https://github.com/Ren-Research/LOMAR

Related papers

Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
A3 RL is a novel method that actively selects data from combined online and offline sources to optimize policy improvement. We provide theoretical guarantee that validates the effectiveness of our active sampling strategy.
arXiv Detail & Related papers (2025-02-11T20:31:59Z)
Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy [12.589890916332196]
offline-to-online (O2O) RL can be leveraged to facilitate faster and safer online policy learning. We introduce textbfMarvel, a novel framework for O2O safe RL, comprising two key components that work in concert. Our work has the great potential to advance the field towards more efficient and practical safe RL solutions.
arXiv Detail & Related papers (2024-12-05T18:51:18Z)
Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL. U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations. We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations. This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process. Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z)
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL. Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z)
Learning-augmented Online Algorithm for Two-level Ski-rental Problem [8.381344684943212]
We study the two-level ski-rental problem,where a user needs to fulfill a sequence of demands for multiple items by choosing one of three payment options. We develop a learning-augmented algorithm (LADTSR) by integrating Machine Learning predictions into the robust online algorithm.
arXiv Detail & Related papers (2024-02-09T16:10:54Z)
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [42.47810044648846]
We consider a hybrid reinforcement learning setting (Hybrid RL) in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. We adapt the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. We show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks.
arXiv Detail & Related papers (2022-10-13T04:19:05Z)
Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach [5.683591363967851]
We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets.
arXiv Detail & Related papers (2021-09-21T18:04:19Z)
Fairness Maximization among Offline Agents in Online-Matching Markets [18.689388188794023]
Matching markets involve heterogeneous agents (typically from two parties) who are paired for mutual benefit. There are two features distinguishing OMMs from traditional matching markets. We propose online matching algorithms which optimize for either individual or group-level fairness.
arXiv Detail & Related papers (2021-09-18T13:41:42Z)
Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases. We first rigorously analyze a deterministic variant of the online threat model. We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.