Related papers: Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

URL: http://arxiv.org/abs/2405.04322v1
Date: Tue, 7 May 2024 13:48:59 GMT
Title: Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies
Authors: Paul Templier, Emmanuel Rachelson, Antoine Cully, Dennis G. Wilson,
Abstract summary: Genetic Drift Regularization (GDR) is a simple regularization method in the actor training loss that prevents the actor genome from drifting away from the ES. We show that GDR can improve ES convergence on problems where RL learns well, but also helps RL training on other tasks.
Score: 9.813386592472535
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Evolutionary Algorithms (EA) have been successfully used for the optimization of neural networks for policy search, but they still remain sample inefficient and underperforming in some cases compared to gradient-based reinforcement learning (RL). Various methods combine the two approaches, many of them training a RL algorithm on data from EA evaluations and injecting the RL actor into the EA population. However, when using Evolution Strategies (ES) as the EA, the RL actor can drift genetically far from the the ES distribution and injection can cause a collapse of the ES performance. Here, we highlight the phenomenon of genetic drift where the actor genome and the ES population distribution progressively drift apart, leading to injection having a negative impact on the ES. We introduce Genetic Drift Regularization (GDR), a simple regularization method in the actor training loss that prevents the actor genome from drifting away from the ES. We show that GDR can improve ES convergence on problems where RL learns well, but also helps RL training on other tasks, , fixes the injection issues better than previous controlled injection methods.

Related papers

Data-regularized Reinforcement Learning for Diffusion Models at Scale [99.01056178660538]
We introduce Data-regularized Diffusion Reinforcement Learning ( DDRL), a novel framework that uses the forward KL divergence to anchor the policy to an off-policy data distribution.<n>With over a million GPU hours of experiments and ten thousand double-blind evaluations, we demonstrate that DDRL significantly improves rewards while alleviating the reward hacking seen in RLs.
arXiv Detail & Related papers (2025-12-03T23:45:07Z)
TOPSIS-like metaheuristic for LABS problem [70.49434432747293]
We introduce socio-cognitive mutation mechanisms that integrate strategies of following the best solutions and avoiding the worst.<n>By guiding search agents to imitate high-performing solutions and avoid poor ones, these operators enhance both solution diversity and convergence efficiency.
arXiv Detail & Related papers (2025-11-08T00:47:37Z)
Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization [25.633698252033756]
We propose the Evolutionary Augmentation Mechanism (EAM) to synergize the learning efficiency of DRL with the global search power of GAs.<n>EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation.<n>EAM can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO.
arXiv Detail & Related papers (2025-06-11T05:17:30Z)
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization [52.76330545825083]
Reinforcement learning (RL) has become popular in enhancing the reasoning capabilities of large language models (LLMs)<n>We identify a previously unrecognized phenomenon we term Lazy Likelihood Displacement (LLD), wherein the likelihood of correct responses marginally increases or even decreases during training.<n>We develop a method called NTHR, which downweights penalties on tokens contributing to the LLD. Unlike prior DPO-based approaches, NTHR takes advantage of GRPO's group-based structure, using correct responses as anchors to identify influential tokens.
arXiv Detail & Related papers (2025-05-24T18:58:51Z)
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning [34.25769740497309]
GenPO is a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings.<n>GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.
arXiv Detail & Related papers (2025-05-24T15:57:07Z)
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT. In each training iteration, we first generate AEs using traditional gradient-based methods. Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs. The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
A Coefficient Makes SVRG Effective [51.36251650664215]
Variance Reduced Gradient (SVRG) is a theoretically compelling optimization method. In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks.
arXiv Detail & Related papers (2023-11-09T18:47:44Z)
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years. We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z)
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z)
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods. We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z)
Enabling surrogate-assisted evolutionary reinforcement learning via policy embedding [28.272572839321104]
This paper proposes a PE-SAERL Framework to enable surrogate-assisted evolutionary reinforcement learning via policy embedding. Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms.
arXiv Detail & Related papers (2023-01-31T02:36:06Z)
Direct Mutation and Crossover in Genetic Algorithms Applied to Reinforcement Learning Tasks [0.9137554315375919]
This paper will focus on applying neuroevolution using a simple genetic algorithm (GA) to find the weights of a neural network that produce optimally behaving agents. We present two novel modifications that improve the data efficiency and speed of convergence when compared to the initial implementation.
arXiv Detail & Related papers (2022-01-13T07:19:28Z)
Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)
IE-GAN: An Improved Evolutionary Generative Adversarial Network Using a New Fitness Function and a Generic Crossover Operator [20.100388977505002]
We propose an improved E-GAN framework called IE-GAN, which introduces a new fitness function and a generic crossover operator. In particular, the proposed fitness function can model the evolutionary process of individuals more accurately. The crossover operator, which has been commonly adopted in evolutionary algorithms, can enable offspring to imitate the superior gene expression of their parents.
arXiv Detail & Related papers (2021-07-25T13:55:07Z)
Adam revisited: a weighted past gradients perspective [57.54752290924522]
We propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues. We prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD.
arXiv Detail & Related papers (2021-01-01T14:01:52Z)
Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy [3.404507240556492]
Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks. There are two limitations in the current ES practice that may hinder its otherwise further capabilities. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training. We show that DGS-ES is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
arXiv Detail & Related papers (2020-02-21T01:05:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.