Deep Policies for Online Bipartite Matching: A Reinforcement Learning
Approach
- URL: http://arxiv.org/abs/2109.10380v1
- Date: Tue, 21 Sep 2021 18:04:19 GMT
- Title: Deep Policies for Online Bipartite Matching: A Reinforcement Learning
Approach
- Authors: Mohammad Ali Alomrani, Reza Moravej, Elias B. Khalil
- Abstract summary: We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data.
We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets.
- Score: 5.683591363967851
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: From assigning computing tasks to servers and advertisements to users,
sequential online matching problems arise in a wide variety of domains. The
challenge in online matching lies in making irrevocable assignments while there
is uncertainty about future inputs. In the theoretical computer science
literature, most policies are myopic or greedy in nature. In real-world
applications where the matching process is repeated on a regular basis, the
underlying data distribution can be leveraged for better decision-making. We
present an end-to-end Reinforcement Learning framework for deriving better
matching policies based on trial-and-error on historical data. We devise a set
of neural network architectures, design feature representations, and
empirically evaluate them across two online matching problems: Edge-Weighted
Online Bipartite Matching and Online Submodular Bipartite Matching. We show
that most of the learning approaches perform significantly better than
classical greedy algorithms on four synthetic and real-world datasets. Our code
is publicly available at https://github.com/lyeskhalil/CORL.git.
Related papers
- PageRank Bandits for Link Prediction [72.61386754332776]
Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion.
This paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially.
We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration.
arXiv Detail & Related papers (2024-11-03T02:39:28Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Learning for Edge-Weighted Online Bipartite Matching with Robustness
Guarantees [28.737193318136725]
We propose a novel approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR)
LOMAR achieves both good average-case and worst-case performance.
arXiv Detail & Related papers (2023-05-31T20:41:42Z) - Adversarial Deep Learning for Online Resource Allocation [12.118811903399951]
We use deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch.
Our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee.
arXiv Detail & Related papers (2021-11-19T15:48:43Z) - Combining Online Learning and Offline Learning for Contextual Bandits
with Deficient Support [53.11601029040302]
Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting.
We propose a novel approach that uses a hybrid of offline learning with online exploration.
Our approach determines an optimal policy with theoretical guarantees using the minimal number of online explorations.
arXiv Detail & Related papers (2021-07-24T05:07:43Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Active Online Learning with Hidden Shifting Domains [64.75186088512034]
We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries.
Our algorithm can adaptively deal with interleaving spans of inputs from different domains.
arXiv Detail & Related papers (2020-06-25T15:23:59Z) - Unbiased Learning to Rank: Online or Offline? [28.431648823968278]
How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR.
Existing work on unbiased learning to rank can be broadly categorized into two groups -- the studies on unbiased learning algorithms with logged data, and the studies on unbiased parameters estimation with real-time user interactions.
This paper formalizes the task of unbiased learning to rank and shows that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin.
arXiv Detail & Related papers (2020-04-28T15:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.