Related papers: Massively Scalable Inverse Reinforcement Learning in Google Maps

Massively Scalable Inverse Reinforcement Learning in Google Maps

URL: http://arxiv.org/abs/2305.11290v4
Date: Tue, 5 Mar 2024 22:07:29 GMT
Title: Massively Scalable Inverse Reinforcement Learning in Google Maps
Authors: Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O'Banion
Abstract summary: Inverse reinforcement learning offers a powerful and general framework for learning humans' latent preferences in route recommendation. No approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. We revisit classic IRL methods in the routing context, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust policies. This insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon.
Score: 3.1244966374281544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. In this paper, we introduce scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms. We revisit classic IRL methods in the routing context, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. This insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in route quality at a global scale, and to the best of our knowledge, represents the largest published study of IRL algorithms in a real-world setting to date. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.

Related papers

Practical Performative Policy Learning with Strategic Agents [8.361090623217246]
We study the performative policy learning problem, where agents adjust their features in response to a released policy to improve their potential outcomes. We propose a gradient-based policy optimization algorithm with a differentiable classifier as a substitute for the high-dimensional distribution map.
arXiv Detail & Related papers (2024-12-02T10:09:44Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
High-Precision Geosteering via Reinforcement Learning and Particle Filters [0.0]
Geosteering is a key component of drilling operations and traditionally involves manual interpretation of various data sources such as well-log data. Academic attempts to solve geosteering decision optimization with greedy optimization and Approximate Dynamic Programming (ADP) showed promise but lacked adaptivity to realistic diverse scenarios. We propose Reinforcement learning (RL) to facilitate optimal decision-making through reward-based iterative learning.
arXiv Detail & Related papers (2024-02-09T12:54:34Z)
Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design [54.39859618450935]
We show that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-supervised-training distribution impact the performance of these algorithms.
arXiv Detail & Related papers (2023-10-04T12:52:56Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z)
Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy. We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z)
Unsupervised Resource Allocation with Graph Neural Networks [0.0]
We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way. We propose to learn the reward structure for near-optimal allocation policies with a GNN.
arXiv Detail & Related papers (2021-06-17T18:44:04Z)
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications. We propose a learning algorithm that decouples the action constraints from the policy parameter update. We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z)
Inverse Reinforcement Learning from a Gradient-based Learner [41.8663538249537]
Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent.
arXiv Detail & Related papers (2020-07-15T16:41:00Z)
Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications. We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space. By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.