Imitating Cost-Constrained Behaviors in Reinforcement Learning
- URL: http://arxiv.org/abs/2403.17456v3
- Date: Thu, 23 May 2024 08:57:17 GMT
- Title: Imitating Cost-Constrained Behaviors in Reinforcement Learning
- Authors: Qian Shao, Pradeep Varakantham, Shih-Fen Cheng,
- Abstract summary: In this paper, we provide methods that match expert distributions in the presence of trajectory cost constraints.
We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly.
- Score: 8.143750358586072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.
Related papers
- Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories [16.666811573117613]
The primary goal of motion planning is to generate safe and efficient trajectories for vehicles.
Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts.
We propose a method that integrates constraint learning into imitation learning by extracting driving constraints from expert trajectories.
arXiv Detail & Related papers (2024-12-07T18:29:28Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties [8.86470998648085]
Key challenge is handling expected cost accumulated using the policy.
Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions.
We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
arXiv Detail & Related papers (2023-01-27T08:33:08Z) - CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning [9.432068833600884]
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment.
There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines.
This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
arXiv Detail & Related papers (2022-10-03T21:16:14Z) - Learning Soft Constraints From Constrained Expert Demonstrations [16.442694252601452]
Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function.
We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data.
We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios.
arXiv Detail & Related papers (2022-06-02T21:45:31Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - How To Not Drive: Learning Driving Constraints from Demonstration [0.0]
We propose a new scheme to learn motion planning constraints from human driving trajectories.
The behavioral planning is responsible for high-level decision making required to follow traffic rules.
The motion planner role is to generate feasible, safe trajectories for a self-driving vehicle to follow.
arXiv Detail & Related papers (2021-10-01T20:47:04Z) - Learning-based Preference Prediction for Constrained Multi-Criteria
Path-Planning [12.457788665461312]
Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application.
We leverage knowledge acquired through offline simulations by training a neural network model to predict the uncertain criterion.
We integrate this model inside a path-planner which can solve problems online.
arXiv Detail & Related papers (2021-08-02T17:13:45Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Regularized Inverse Reinforcement Learning [49.78352058771138]
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
arXiv Detail & Related papers (2020-10-07T23:38:47Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.