Related papers: Hyperparameter Optimization for Multi-Objective Reinforcement Learning

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

URL: http://arxiv.org/abs/2310.16487v1
Date: Wed, 25 Oct 2023 09:17:25 GMT
Title: Hyperparameter Optimization for Multi-Objective Reinforcement Learning
Authors: Florian Felten, Daniel Gareev, El-Ghazali Talbi, Gr\'egoire Danoy
Abstract summary: Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL. In practice, this task often proves to be challenging, leading to unsuccessful deployments of these techniques.
Score: 0.27309692684728615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL by enabling agents to make trade-offs among multiple objectives. This advancement not only has broadened the range of problems that can be tackled but also created numerous opportunities for exploration and advancement. Yet, the effectiveness of RL agents heavily relies on appropriately setting their hyperparameters. In practice, this task often proves to be challenging, leading to unsuccessful deployments of these techniques in various instances. Hence, prior research has explored hyperparameter optimization in RL to address this concern. This paper presents an initial investigation into the challenge of hyperparameter optimization specifically for MORL. We formalize the problem, highlight its distinctive challenges, and propose a systematic methodology to address it. The proposed methodology is applied to a well-known environment using a state-of-the-art MORL algorithm, and preliminary results are reported. Our findings indicate that the proposed methodology can effectively provide hyperparameter configurations that significantly enhance the performance of MORL agents. Furthermore, this study identifies various future research opportunities to further advance the field of hyperparameter optimization for MORL.

Related papers

Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
HyperQ-Opt: Q-learning for Hyperparameter Optimization [0.0]
This paper presents a novel perspective on HPO by formulating it as a sequential decision-making problem and leveraging Q-learning, a reinforcement learning technique. The approaches are evaluated for their ability to find optimal or near-optimal configurations within a limited number of trials. By shifting the paradigm toward policy-based optimization, this work contributes to advancing HPO methods for scalable and efficient machine learning applications.
arXiv Detail & Related papers (2024-12-23T18:22:34Z)
Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning [48.81121647322492]
textbfHyper is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments. textbfHyper extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training.
arXiv Detail & Related papers (2024-12-04T23:12:41Z)
Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks. We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads. We take the first step to unify all approaches by dissecting them from a decomposition perspective. We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
On the consistency of hyper-parameter selection in value-based deep reinforcement learning [13.133865673667394]
This paper conducts an empirical study focusing on the reliability of hyper- parameter selection for value-based deep reinforcement learning agents. Our findings help establish which hyper- parameters are most critical to tune, and help clarify which tunings remain consistent across different training regimes.
arXiv Detail & Related papers (2024-06-25T13:06:09Z)
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. We propose an efficient adaptive trajectory-constrained exploration strategy for DRL. We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL. They tend to struggle when asked to accomplish a series of heterogeneous tasks. We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z)
Evolving Populations of Diverse RL Agents with MAP-Elites [1.5575376673936223]
We introduce a flexible framework that allows the use of any Reinforcement Learning (RL) algorithm instead of just policies. We demonstrate the benefits brought about by our framework through extensive numerical experiments on a number of robotics control problems.
arXiv Detail & Related papers (2023-03-09T19:05:45Z)
Evolutionary Reinforcement Learning: A Survey [31.112066295496003]
Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL)
arXiv Detail & Related papers (2023-03-07T01:38:42Z)
Reinforcement Learning-Empowered Mobile Edge Computing for 6G Edge Intelligence [76.96698721128406]
Mobile edge computing (MEC) considered a novel paradigm for computation and delay-sensitive tasks in fifth generation (5G) networks and beyond. This paper provides a comprehensive research review on free-enabled RL and offers insight for development.
arXiv Detail & Related papers (2022-01-27T10:02:54Z)
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization [48.5614138038673]
We propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyper parameter search evaluation. We also develop a novel hyper parameter optimization algorithm called BOSS. Empirical studies validate our theoretical arguments of SS and demonstrate the superior performance of BOSS on a number of applications.
arXiv Detail & Related papers (2020-07-11T03:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.