Using soft maximin for risk averse multi-objective decision-making
- URL: http://arxiv.org/abs/2208.04273v1
- Date: Mon, 8 Aug 2022 17:09:11 GMT
- Title: Using soft maximin for risk averse multi-objective decision-making
- Authors: Benjamin J Smith and Robert Klassert and Roland Pihlakas
- Abstract summary: Split-function exp-log loss aversion (SFELLA) learns faster than the state-of-the-art thresholded alignment objective method.
SFELLA shows relative robustness improvements against changes in objective scale.
It is useful for avoiding problems that sometimes occur with a thresholded approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Balancing multiple competing and conflicting objectives is an essential task
for any artificial intelligence tasked with satisfying human values or
preferences. Conflict arises both from misalignment between individuals with
competing values, but also between conflicting value systems held by a single
human. Starting with principle of loss-aversion, we designed a set of soft
maximin function approaches to multi-objective decision-making. Bench-marking
these functions in a set of previously-developed environments, we found that
one new approach in particular, `split-function exp-log loss aversion'
(SFELLA), learns faster than the state of the art thresholded alignment
objective method \cite{vamplew_potential-based_2021} on three of four tasks it
was tested on, and achieved the same optimal performance after learning. SFELLA
also showed relative robustness improvements against changes in objective
scale, which may highlight an advantage dealing with distribution shifts in the
environment dynamics. Further work had to be omitted from the preprint, but in
the final published version, we will further compare SFELLA to the
multi-objective reward exponentials (MORE) approach, demonstrating that SFELLA
performs similarly to MORE in a simple previously-described foraging task, but
in a modified foraging environment with a new resource that was not depleted as
the agent worked, SFELLA collected more of the new resource with very little
cost incurred in terms of the old resource. Overall, we found SFELLA useful for
avoiding problems that sometimes occur with a thresholded approach, and more
reward-responsive than MORE while retaining its conservative, loss-averse
incentive structure.
Related papers
- DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints [68.82294911302579]
We introduce DiveR-CT, which relaxes conventional constraints on the objective and semantic reward, granting greater freedom for the policy to enhance diversity.
Our experiments demonstrate DiveR-CT's marked superiority over baselines by 1) generating data that perform better in various diversity metrics across different attack success rate levels, 2) better-enhancing resiliency in blue team models through safety tuning based on collected data, 3) allowing dynamic control of objective weights for reliable and controllable attack success rates, and 4) reducing susceptibility to reward overoptimization.
arXiv Detail & Related papers (2024-05-29T12:12:09Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization [0.0]
PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy.
Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains.
It is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability.
arXiv Detail & Related papers (2023-12-15T20:41:09Z) - TMoE-P: Towards the Pareto Optimum for Multivariate Soft Sensors [7.236362889442992]
We reformulate the multi-variate soft sensor to a multi-objective problem, to address both issues and advance state-of-the-art performance.
To handle the negative transfer issue, we first propose an Objective-aware Mixture-of-Experts (OMoE) module, utilizing objective-specific and objective-shared experts for parameter sharing.
To address the seesaw phenomenon, we then propose a Task-aware Mixture-of-Experts framework for achieving the optimum routing.
arXiv Detail & Related papers (2023-02-21T06:49:09Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Modularity benefits reinforcement learning agents with competing
homeostatic drives [5.044282303487273]
We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable.
We find that the modular agent: a) requires minimally determined exploration; b. has improved sample efficiency; and c. is more robust to out-of-domain perturbation.
arXiv Detail & Related papers (2022-04-13T18:57:55Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.