Related papers: Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

URL: http://arxiv.org/abs/2211.08669v1
Date: Wed, 16 Nov 2022 04:56:42 GMT
Title: Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning
Authors: Kewen Ding
Abstract summary: Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL) This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL) to solve multi-objective problems. One of common algorithm is to extend scalar value Q-learning by using vector Q values in combination with a utility function, which captures the user's preference for action selection. This study follows on prior works, and focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment with stochastic state transitions in scenarios where the goal is to maximise the Scalarised Expected Return (SER) - that is, to maximise the average outcome over multiple runs rather than the outcome within each individual episode. The analysis of the interaction between stochastic environment and MORL Q-learning algorithms run on a simple Multi-objective Markov decision process (MOMDP) Space Traders problem with different variant versions. The empirical evaluations show that well designed reward signal can improve the performance of the original baseline algorithm, however it is still not enough to address more general environment. A variant of MORL Q-Learning incorporating global statistics is shown to outperform the baseline method in original Space Traders problem, but remains below 100 percent effectiveness in finding the find desired SER-optimal policy at the end of training. On the other hand, Option learning is guarantied to converge to desired SER-optimal policy but it is not able to scale up to solve more complex problem in real-life. The main contribution of this thesis is to identify the extent to which the issue of noisy Q-value estimates impacts on the ability to learn optimal policies under the combination of stochastic environments, non-linear utility and a constant learning rate.

Related papers

Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA) Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments [1.26404863283601]
This paper examines the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy. We highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.
arXiv Detail & Related papers (2024-01-06T08:43:08Z)
Multi-Task Learning on Networks [0.0]
Multi-objective optimization problems arising in the multi-task learning context have specific features and require adhoc methods. In this thesis the solutions in the Input Space are represented as probability distributions encapsulating the knowledge contained in the function evaluations. In this space of probability distributions, endowed with the metric given by the Wasserstein distance, a new algorithm MOEA/WST can be designed in which the model is not directly on the objective function.
arXiv Detail & Related papers (2021-12-07T09:13:10Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.