Evolution Transformer: In-Context Evolutionary Optimization
- URL: http://arxiv.org/abs/2403.02985v1
- Date: Tue, 5 Mar 2024 14:04:13 GMT
- Title: Evolution Transformer: In-Context Evolutionary Optimization
- Authors: Robert Tjarko Lange, Yingtao Tian, Yujin Tang
- Abstract summary: We introduce Evolution Transformer, a causal Transformer architecture, which can flexibly characterize a family of Evolution Strategies.
We train the model weights using Evolutionary Algorithm Distillation, a technique for supervised optimization of sequence models.
We analyze the resulting properties of the Evolution Transformer and propose a technique to fully self-referentially train the Evolution Transformer.
- Score: 6.873777465945062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evolutionary optimization algorithms are often derived from loose biological
analogies and struggle to leverage information obtained during the sequential
course of optimization. An alternative promising approach is to leverage data
and directly discover powerful optimization principles via meta-optimization.
In this work, we follow such a paradigm and introduce Evolution Transformer, a
causal Transformer architecture, which can flexibly characterize a family of
Evolution Strategies. Given a trajectory of evaluations and search distribution
statistics, Evolution Transformer outputs a performance-improving update to the
search distribution. The architecture imposes a set of suitable inductive
biases, i.e. the invariance of the distribution update to the order of
population members within a generation and equivariance to the order of the
search dimensions. We train the model weights using Evolutionary Algorithm
Distillation, a technique for supervised optimization of sequence models using
teacher algorithm trajectories. The resulting model exhibits strong in-context
optimization performance and shows strong generalization capabilities to
otherwise challenging neuroevolution tasks. We analyze the resulting properties
of the Evolution Transformer and propose a technique to fully
self-referentially train the Evolution Transformer, starting from a random
initialization and bootstrapping its own learning progress. We provide an open
source implementation under https://github.com/RobertTLange/evosax.
Related papers
- Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Lottery Tickets in Evolutionary Optimization: On Sparse
Backpropagation-Free Trainability [0.0]
We study gradient descent (GD)-based sparse training and evolution strategies (ES)
We find that ES explore diverse and flat local optima and do not preserve linear mode connectivity across sparsity levels and independent runs.
arXiv Detail & Related papers (2023-05-31T15:58:54Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - evosax: JAX-based Evolution Strategies [0.0]
We release evosax: a JAX-based library of evolutionary optimization algorithms.
evosax implements 30 evolutionary optimization algorithms including finite-difference-based, estimation-of-distribution evolution strategies and various genetic algorithms.
It is designed in a modular fashion and allows for flexible usage via a simple ask-evaluate-tell API.
arXiv Detail & Related papers (2022-12-08T10:34:42Z) - Accelerating the Evolutionary Algorithms by Gaussian Process Regression
with $\epsilon$-greedy acquisition function [2.7716102039510564]
We propose a novel method to estimate the elite individual to accelerate the convergence of optimization.
Our proposal has a broad prospect to estimate the elite individual and accelerate the convergence of optimization.
arXiv Detail & Related papers (2022-10-13T07:56:47Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Evolutionary Variational Optimization of Generative Models [0.0]
We combine two popular optimization approaches to derive learning algorithms for generative models: variational optimization and evolutionary algorithms.
We show that evolutionary algorithms can effectively and efficiently optimize the variational bound.
In the category of "zero-shot" learning, we observed the evolutionary variational algorithm to significantly improve the state-of-the-art in many benchmark settings.
arXiv Detail & Related papers (2020-12-22T19:06:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.