Fast Slate Policy Optimization: Going Beyond Plackett-Luce
- URL: http://arxiv.org/abs/2308.01566v2
- Date: Fri, 29 Dec 2023 11:26:51 GMT
- Title: Fast Slate Policy Optimization: Going Beyond Plackett-Luce
- Authors: Otmane Sakhi, David Rohde, Nicolas Chopin
- Abstract summary: This paper addresses the optimization of large scale decision systems given an arbitrary reward function.
We propose a new class of policies, born from a novel relaxation of decision functions.
This results in a simple, yet efficient learning algorithm that scales to massive action spaces.
- Score: 7.366405857677226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasingly important building block of large scale machine learning
systems is based on returning slates; an ordered lists of items given a query.
Applications of this technology include: search, information retrieval and
recommender systems. When the action space is large, decision systems are
restricted to a particular structure to complete online queries quickly. This
paper addresses the optimization of these large scale decision systems given an
arbitrary reward function. We cast this learning problem in a policy
optimization framework and propose a new class of policies, born from a novel
relaxation of decision functions. This results in a simple, yet efficient
learning algorithm that scales to massive action spaces. We compare our method
to the commonly adopted Plackett-Luce policy class and demonstrate the
effectiveness of our approach on problems with action space sizes in the order
of millions.
Related papers
- Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling
Problems Based on Constraint Programming [5.070542698701157]
This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL)
Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets.
arXiv Detail & Related papers (2023-06-09T08:24:56Z) - Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making [73.48977854003697]
This work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space.
We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions.
arXiv Detail & Related papers (2023-02-10T18:45:52Z) - Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size.
Our contribution is based upon combining three novel ideas.
Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z) - Bayesian Non-stationary Linear Bandits for Large-Scale Recommender
Systems [6.009759445555003]
We build upon the linear contextual multi-armed bandit framework to address this problem.
We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors.
Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime.
arXiv Detail & Related papers (2022-02-07T13:51:19Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Adaptive Discretization in Online Reinforcement Learning [9.560980936110234]
Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it.
We provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning.
Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets.
arXiv Detail & Related papers (2021-10-29T15:06:15Z) - Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner
with Evolution-Based Optimization [49.207538634692916]
We propose a new algorithm for circuit routing, named Ranking Cost, to form an efficient and trainable router.
In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths.
Our algorithm is trained in an end-to-end manner and does not use any artificial data or human demonstration.
arXiv Detail & Related papers (2021-10-08T07:22:45Z) - SOLO: Search Online, Learn Offline for Combinatorial Optimization
Problems [4.777801093677586]
We study problems with real world applications such as machine scheduling, routing, and assignment.
We propose a method that combines Reinforcement Learning (RL) and planning.
This method can equally be applied to both the offline, as well as online, variants of the problem, in which the problem components are not known in advance, but rather arrive during the decision-making process.
arXiv Detail & Related papers (2021-04-04T17:12:24Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.