From STL Rulebooks to Rewards
- URL: http://arxiv.org/abs/2110.02792v1
- Date: Wed, 6 Oct 2021 14:16:59 GMT
- Title: From STL Rulebooks to Rewards
- Authors: Edgar A. Aguilar, Luigi Berducci, Axel Brunnbauer, Radu Grosu, Dejan
Ni\v{c}kovi\'c
- Abstract summary: We propose a principled approach to shaping rewards for reinforcement learning from multiple objectives.
We first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements.
We then develop a method for systematically combining evaluations of multiple requirements into a single reward.
- Score: 4.859570041295978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The automatic synthesis of neural-network controllers for autonomous agents
through reinforcement learning has to simultaneously optimize many, possibly
conflicting, objectives of various importance. This multi-objective
optimization task is reflected in the shape of the reward function, which is
most often the result of an ad-hoc and crafty-like activity.
In this paper we propose a principled approach to shaping rewards for
reinforcement learning from multiple objectives that are given as a
partially-ordered set of signal-temporal-logic (STL) rules. To this end, we
first equip STL with a novel quantitative semantics allowing to automatically
evaluate individual requirements. We then develop a method for systematically
combining evaluations of multiple requirements into a single reward that takes
into account the priorities defined by the partial order. We finally evaluate
our approach on several case studies, demonstrating its practical
applicability.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems [3.2826250607043796]
Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems.
In this work, we use a MORL technique called Deep W-Learning (DWN) to find the optimal configuration for runtime performance optimization.
We compare DWN to two single-objective optimization implementations: epsilon-greedy algorithm and Deep Q-Networks.
arXiv Detail & Related papers (2024-08-02T11:16:09Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way.
We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.