Dynamic value alignment through preference aggregation of multiple
objectives
- URL: http://arxiv.org/abs/2310.05871v1
- Date: Mon, 9 Oct 2023 17:07:26 GMT
- Title: Dynamic value alignment through preference aggregation of multiple
objectives
- Authors: Marcin Korecki, Damian Dailisan, Cesare Carissimo
- Abstract summary: We present a methodology for dynamic value alignment, where the values that are to be aligned with are dynamically changing.
We apply this approach to extend Deep $Q$-Learning to accommodate multiple objectives and evaluate this method on a simplified two-leg intersection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of ethical AI systems is currently geared toward setting
objective functions that align with human objectives. However, finding such
functions remains a research challenge, while in RL, setting rewards by hand is
a fairly standard approach. We present a methodology for dynamic value
alignment, where the values that are to be aligned with are dynamically
changing, using a multiple-objective approach. We apply this approach to extend
Deep $Q$-Learning to accommodate multiple objectives and evaluate this method
on a simplified two-leg intersection controlled by a switching agent.Our
approach dynamically accommodates the preferences of drivers on the system and
achieves better overall performance across three metrics (speeds, stops, and
waits) while integrating objectives that have competing or conflicting actions.
Related papers
- Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms.
Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Physical Reasoning Using Dynamics-Aware Models [32.402950370430496]
This study aims to address the limitation by augmenting the reward value with additional supervisory signals about object dynamics.
Specifically,we define a distance measure between the trajectory of two target objects, and use this distance measure to characterize the similarity of two environment rollouts.
We train the model to correctly rank rollouts according to this measure in addition to predicting the correct reward.
arXiv Detail & Related papers (2021-02-20T12:56:16Z) - Momentum-based Gradient Methods in Multi-Objective Recommendation [30.894950420437926]
We create a multi-objective model-agnostic Adamize method for solving single-objective problems.
We evaluate the benefits of Multi-objective Adamize on two multi-objective recommender systems and for three different objective combinations, both correlated or conflicting.
arXiv Detail & Related papers (2020-09-10T07:12:21Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way.
We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.