Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning
- URL: http://arxiv.org/abs/2501.13883v1
- Date: Thu, 23 Jan 2025 17:56:40 GMT
- Title: Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning
- Authors: Matyáš Lorenc,
- Abstract summary: We explore a capability of evolution strategies to train an agent with its policy based on a transformer architecture in a reinforcement learning setting.<n>We performed experiments using OpenAI's highly parallelizable evolution strategy to train Decision Transformer in Humanoid environment and in Atari games.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore a capability of evolution strategies to train an agent with its policy based on a transformer architecture in a reinforcement learning setting. We performed experiments using OpenAI's highly parallelizable evolution strategy to train Decision Transformer in Humanoid locomotion environment and in the environment of Atari games, testing the ability of this black-box optimization technique to train even such relatively large and complicated models (compared to those previously tested in the literature). We also proposed a method to aid the training by first pretraining the model before using the OpenAI-ES to train it further, and tested its effectiveness. The examined evolution strategy proved to be, in general, capable of achieving strong results and managed to obtain high-performing agents. Therefore, the pretraining was shown to be unnecessary; yet still, it helped us observe and formulate several further insights.
Related papers
- Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong [0.0]
We present Evo-Sparrow, a deep learning-based agent for AI decision-making in Sparrow Mahjong.<n>Our model evaluates board states and optimize decision policies in a non-deterministic, partially observable game environment.
arXiv Detail & Related papers (2025-08-11T00:53:52Z) - Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers [51.992454203752686]
Transformer models learn in two distinct modes: in-weights learning (IWL) and in-context learning (ICL)<n>We draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding and phenotypic plasticity.<n>We experimentally operationalize these dimensions of predictability and investigate their influence on the ICL/IWL balance in Transformers.
arXiv Detail & Related papers (2025-05-14T23:31:17Z) - Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT.
In each training iteration, we first generate AEs using traditional gradient-based methods.
Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs.
The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z) - Utilizing Novelty-based Evolution Strategies to Train Transformers in Reinforcement Learning [0.0]
We evaluate novelty-based variants of OpenAI-ES, the NS-ES and NSR-ES algorithms.
We also test if we can accelerate the novelty-based training of larger models by seeding the training by a pretrained models.
arXiv Detail & Related papers (2025-02-10T09:44:10Z) - DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting [37.334947053450996]
We introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive strengths of the Online Decision Transformer.
Our methodology enables parallel training where Dreamer-produced trajectories enhance the contextual decision-making of the transformer.
arXiv Detail & Related papers (2024-10-15T07:27:56Z) - Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - Automatic Instruction Evolving for Large Language Models [93.52437926313621]
Auto Evol-Instruct is an end-to-end framework that evolves instruction datasets using large language models without any human effort.
Our experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks.
arXiv Detail & Related papers (2024-06-02T15:09:00Z) - Evolution Transformer: In-Context Evolutionary Optimization [6.873777465945062]
We introduce Evolution Transformer, a causal Transformer architecture, which can flexibly characterize a family of Evolution Strategies.
We train the model weights using Evolutionary Algorithm Distillation, a technique for supervised optimization of sequence models.
We analyze the resulting properties of the Evolution Transformer and propose a technique to fully self-referentially train the Evolution Transformer.
arXiv Detail & Related papers (2024-03-05T14:04:13Z) - DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary
Intelligence [77.78795329701367]
We present DARLEI, a framework that combines evolutionary algorithms with parallelized reinforcement learning.
We characterize DARLEI's performance under various conditions, revealing factors impacting diversity of evolved morphologies.
We hope to extend DARLEI in future work to include interactions between diverse morphologies in richer environments.
arXiv Detail & Related papers (2023-12-08T16:51:10Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - On Transforming Reinforcement Learning by Transformer: The Development
Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision.
We group existing developments in two categories: architecture enhancement and trajectory optimization.
We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z) - Discovering Evolution Strategies via Meta-Black-Box Optimization [23.956974467496345]
We propose to discover effective update rules for evolution strategies via meta-learning.
Our approach employs a search strategy parametrized by a self-attention-based architecture.
We show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.
arXiv Detail & Related papers (2022-11-21T08:48:46Z) - Shaped Policy Search for Evolutionary Strategies using Waypoints [17.8055398673228]
We try to improve exploration in Blackbox methods, particularly Evolution strategies (ES)
We use the state-action pairs from the trajectories obtained during rollouts/evaluations to learn the dynamics of the agent.
The learnt dynamics are then used in the optimization procedure to speed-up training.
arXiv Detail & Related papers (2021-05-30T22:15:06Z) - Empirical Evaluation of Supervision Signals for Style Transfer Models [44.39622949370144]
In this work we empirically compare the dominant optimization paradigms which provide supervision signals during training.
We find that backtranslation has model-specific limitations, which inhibits training style transfer models.
We also experiment with Minimum Risk Training, a popular technique in the machine translation community, which, to our knowledge, has not been empirically evaluated in the task of style transfer.
arXiv Detail & Related papers (2021-01-15T15:33:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.