Simultaneously Evolving Deep Reinforcement Learning Models using
Multifactorial Optimization
- URL: http://arxiv.org/abs/2002.12133v2
- Date: Mon, 23 Mar 2020 10:47:41 GMT
- Title: Simultaneously Evolving Deep Reinforcement Learning Models using
Multifactorial Optimization
- Authors: Aritz D. Martinez, Eneko Osaba, Javier Del Ser and Francisco Herrera
- Abstract summary: This work proposes a framework capable of simultaneously evolving several DQL models towards solving interrelated Reinforcement Learning tasks.
A thorough experimentation is presented and discussed so as to assess the performance of the framework.
- Score: 18.703421169342796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Multifactorial Optimization (MFO) has gained a notable
momentum in the research community. MFO is known for its inherent capability to
efficiently address multiple optimization tasks at the same time, while
transferring information among such tasks to improve their convergence speed.
On the other hand, the quantum leap made by Deep Q Learning (DQL) in the
Machine Learning field has allowed facing Reinforcement Learning (RL) problems
of unprecedented complexity. Unfortunately, complex DQL models usually find it
difficult to converge to optimal policies due to the lack of exploration or
sparse rewards. In order to overcome these drawbacks, pre-trained models are
widely harnessed via Transfer Learning, extrapolating knowledge acquired in a
source task to the target task. Besides, meta-heuristic optimization has been
shown to reduce the lack of exploration of DQL models. This work proposes a MFO
framework capable of simultaneously evolving several DQL models towards solving
interrelated RL tasks. Specifically, our proposed framework blends together the
benefits of meta-heuristic optimization, Transfer Learning and DQL to automate
the process of knowledge transfer and policy learning of distributed RL agents.
A thorough experimentation is presented and discussed so as to assess the
performance of the framework, its comparison to the traditional methodology for
Transfer Learning in terms of convergence, speed and policy quality , and the
intertask relationships found and exploited over the search process.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Enhancing Textbook Question Answering Task with Large Language Models
and Retrieval Augmented Generation [3.948068081583197]
This paper proposes a methodology that handle the out-of-domain scenario in Textbook question answering (TQA)
Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
arXiv Detail & Related papers (2024-02-05T11:58:56Z) - H-ensemble: An Information Theoretic Approach to Reliable Few-Shot
Multi-Source-Free Transfer [4.328706834250445]
We propose a framework named H-ensemble, which learns the optimal linear combination of source models for the target task.
Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt.
We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.
arXiv Detail & Related papers (2023-12-19T17:39:34Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.