Distributed Evolution Strategies Using TPUs for Meta-Learning
- URL: http://arxiv.org/abs/2201.00093v1
- Date: Sat, 1 Jan 2022 02:14:02 GMT
- Title: Distributed Evolution Strategies Using TPUs for Meta-Learning
- Authors: Alex Sheng, Derek He
- Abstract summary: We propose a distributed evolutionary meta-learning strategy using Processing Units (TPUs) that is highly parallel and scalable to arbitrarily long tasks with no increase in memory cost.
Using a Prototypical Network trained with evolution strategies on the Omniglot dataset, we achieved an accuracy of 98.4% on a 5-shot classification problem.
Our algorithm used as much as 40 times less memory than automatic differentiation to compute the gradient, with the resulting model achieving accuracy within 1.3% of a backpropagation-trained equivalent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning traditionally relies on backpropagation through entire tasks to
iteratively improve a model's learning dynamics. However, this approach is
computationally intractable when scaled to complex tasks. We propose a
distributed evolutionary meta-learning strategy using Tensor Processing Units
(TPUs) that is highly parallel and scalable to arbitrarily long tasks with no
increase in memory cost. Using a Prototypical Network trained with evolution
strategies on the Omniglot dataset, we achieved an accuracy of 98.4% on a
5-shot classification problem. Our algorithm used as much as 40 times less
memory than automatic differentiation to compute the gradient, with the
resulting model achieving accuracy within 1.3% of a backpropagation-trained
equivalent (99.6%). We observed better classification accuracy as high as 99.1%
with larger population configurations. We further experimentally validate the
stability and performance of ES-ProtoNet across a variety of training
conditions (varying population size, model size, number of workers, shot, way,
ES hyperparameters, etc.). Our contributions are twofold: we provide the first
assessment of evolutionary meta-learning in a supervised setting, and create a
general framework for distributed evolution strategies on TPUs.
Related papers
- Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets [40.434003972007744]
We compare open-source simulation models with a real industry dataset to evaluate how optimization methods scale with different levels of complexity.<n>We show that our proposed Evolution Strategies-based method scales much better than a comparable policy-gradient-based approach.<n>We observe double-digit percentage improvement in tardiness and single digit percentage improvement in throughput by use of Evolution Strategies.
arXiv Detail & Related papers (2025-05-16T11:32:29Z) - Flow-GRPO: Training Flow Matching Models via Online RL [75.70017261794422]
We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models.<n>Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Equation (ODE) into an equivalent Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original inference timestep number.
arXiv Detail & Related papers (2025-05-08T17:58:45Z) - Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks [6.805997961535213]
Multiscale Gradient Descent (Multiscale-SGD) is a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost.
We introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions.
Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.
arXiv Detail & Related papers (2025-01-22T09:13:47Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Supplementing Gradient-Based Reinforcement Learning with Simple
Evolutionary Ideas [4.873362301533824]
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL)
The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space.
arXiv Detail & Related papers (2023-05-10T09:46:53Z) - Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Involution: Inverting the Inherence of Convolution for Visual
Recognition [72.88582255910835]
We present a novel atomic operation for deep neural networks by inverting the principles of convolution, coined as involution.
The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition.
Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely.
arXiv Detail & Related papers (2021-03-10T18:40:46Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Cross-Domain Few-Shot Learning with Meta Fine-Tuning [8.062394790518297]
We tackle the new Cross-Domain Few-Shot Learning benchmark proposed by the CVPR 2020 Challenge.
We build upon state-of-the-art methods in domain adaptation and few-shot learning to create a system that can be trained to perform both tasks.
arXiv Detail & Related papers (2020-05-21T09:55:26Z) - A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks.
We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.