EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms
- URL: http://arxiv.org/abs/2510.10603v2
- Date: Thu, 23 Oct 2025 04:20:30 GMT
- Title: EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms
- Authors: WenTao Liu, Siyu Song, Hao Hao, Aimin Zhou,
- Abstract summary: We propose EA4LLM, an evolutionary algorithm for optimizing large language models (LLMs)<n>We empirically verify full- parameter optimization from the pretraining stage across model sizes ranging from 0.5B to 32B.<n>Our work challenges the prevailing assumption that gradient-based optimization is the only viable approach for training neural networks.
- Score: 23.009274904878065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, large language models (LLMs) have made remarkable progress, with model optimization primarily relying on gradient-based optimizers such as Adam. However, these gradient-based methods impose stringent hardware requirements, demanding high-concurrency, high-memory GPUs. Moreover, they require all neural network operations to be differentiable, thereby excluding many promising non-differentiable architectures from practical use. To address these limitations, we propose EA4LLM, an evolutionary algorithm for optimizing LLMs, and, for the first time, empirically verify full-parameter optimization from the pretraining stage across model sizes ranging from 0.5B to 32B. We conduct extensive experiments and provide key insights into how evolutionary algorithms can effectively optimize neural networks. Our work challenges the prevailing assumption that gradient-based optimization is the only viable approach for training neural networks. It also holds significant potential to reduce the computational cost of training large language models, thereby enabling groups with limited computational resources to participate in deep learning research.
Related papers
- Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics [13.621560002904873]
Learnable SMPLify is a neural framework that replaces the iterative fitting process in SMPLify with a single-pass regression model.<n>It achieves nearly 200x faster runtime compared to SMPLify, generalizes well to unseen 3DPW and RICH, and operates as a model-agnostic manner when used as a plug-in tool on LucidAction.
arXiv Detail & Related papers (2025-08-19T06:53:57Z) - Reparameterized LLM Training via Orthogonal Equivalence Transformation [54.80172809738605]
We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
arXiv Detail & Related papers (2025-06-09T17:59:34Z) - Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks.
The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems.
Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning
based Deep Neural Networks [15.29595828816055]
We propose an evolutionary pruning model for Transfer Learning based Deep Neural Networks.
EvoPruneDeepTL replaces the last fully-connected layers with sparse layers optimized by a genetic algorithm.
Results show the contribution of EvoPruneDeepTL and feature selection to the overall computational efficiency of the network.
arXiv Detail & Related papers (2022-02-08T13:07:55Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Enhanced data efficiency using deep neural networks and Gaussian
processes for aerodynamic design optimization [0.0]
Adjoint-based optimization methods are attractive for aerodynamic shape design.
They can become prohibitively expensive when multiple optimization problems are being solved.
We propose a machine learning enabled, surrogate-based framework that replaces the expensive adjoint solver.
arXiv Detail & Related papers (2020-08-15T15:09:21Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.