Modularity benefits reinforcement learning agents with competing
homeostatic drives
- URL: http://arxiv.org/abs/2204.06608v1
- Date: Wed, 13 Apr 2022 18:57:55 GMT
- Title: Modularity benefits reinforcement learning agents with competing
homeostatic drives
- Authors: Zack Dulberg, Rachit Dubey, Isabel M. Berwian, Jonathan D. Cohen
- Abstract summary: We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable.
We find that the modular agent: a) requires minimally determined exploration; b. has improved sample efficiency; and c. is more robust to out-of-domain perturbation.
- Score: 5.044282303487273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of balancing conflicting needs is fundamental to intelligence.
Standard reinforcement learning algorithms maximize a scalar reward, which
requires combining different objective-specific rewards into a single number.
Alternatively, different objectives could also be combined at the level of
action value, such that specialist modules responsible for different objectives
submit different action suggestions to a decision process, each based on
rewards that are independent of one another. In this work, we explore the
potential benefits of this alternative strategy. We investigate a biologically
relevant multi-objective problem, the continual homeostasis of a set of
variables, and compare a monolithic deep Q-network to a modular network with a
dedicated Q-learner for each variable. We find that the modular agent: a)
requires minimal exogenously determined exploration; b) has improved sample
efficiency; and c) is more robust to out-of-domain perturbation.
Related papers
- Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning [10.848218400641466]
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives.
We propose an approach for clustering the solution set generated by MORL.
arXiv Detail & Related papers (2024-11-07T15:26:38Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - A Unifying Perspective on Multi-Calibration: Game Dynamics for
Multi-Objective Learning [63.20009081099896]
We provide a unifying framework for the design and analysis of multicalibrated predictors.
We exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems.
arXiv Detail & Related papers (2023-02-21T18:24:17Z) - A Scale-Independent Multi-Objective Reinforcement Learning with
Convergence Analysis [0.6091702876917281]
Many sequential decision-making problems need optimization of different objectives which possibly conflict with each other.
We develop a single-agent scale-independent multi-objective reinforcement learning on the basis of the Advantage Actor-Critic (A2C) algorithm.
A convergence analysis is then done for the devised multi-objective algorithm providing a convergence-in-mean guarantee.
arXiv Detail & Related papers (2023-02-08T16:38:55Z) - Using soft maximin for risk averse multi-objective decision-making [0.0]
Split-function exp-log loss aversion (SFELLA) learns faster than the state-of-the-art thresholded alignment objective method.
SFELLA shows relative robustness improvements against changes in objective scale.
It is useful for avoiding problems that sometimes occur with a thresholded approach.
arXiv Detail & Related papers (2022-08-08T17:09:11Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.