Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization
- URL: http://arxiv.org/abs/2503.20286v4
- Date: Mon, 14 Apr 2025 03:30:58 GMT
- Title: Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization
- Authors: Zhenyu Liang, Hao Li, Naiwei Yu, Kebin Sun, Ran Cheng,
- Abstract summary: Evolutionary multiobjective optimization (EMO) has made significant strides over the past two decades.<n>Traditional EMO algorithms face substantial performance limitations due to insufficient parallelism and scalability.<n>We propose to parallelize EMO algorithms on GPU via the tensorization methodology.<n>Our experiments show that the tensorized EMO algorithms achieve speedups of up to 1113x compared to their CPU-based counterparts.
- Score: 11.508416084439443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evolutionary multiobjective optimization (EMO) has made significant strides over the past two decades. However, as problem scales and complexities increase, traditional EMO algorithms face substantial performance limitations due to insufficient parallelism and scalability. While most work has focused on algorithm design to address these challenges, little attention has been given to hardware acceleration, thereby leaving a clear gap between EMO algorithms and advanced computing devices, such as GPUs. To bridge the gap, we propose to parallelize EMO algorithms on GPUs via the tensorization methodology. By employing tensorization, the data structures and operations of EMO algorithms are transformed into concise tensor representations, which seamlessly enables automatic utilization of GPU computing. We demonstrate the effectiveness of our approach by applying it to three representative EMO algorithms: NSGA-III, MOEA/D, and HypE. To comprehensively assess our methodology, we introduce a multiobjective robot control benchmark using a GPU-accelerated physics engine. Our experiments show that the tensorized EMO algorithms achieve speedups of up to 1113x compared to their CPU-based counterparts, while maintaining solution quality and effectively scaling population sizes to hundreds of thousands. Furthermore, the tensorized EMO algorithms efficiently tackle complex multiobjective robot control tasks, producing high-quality solutions with diverse behaviors. Source codes are available at https://github.com/EMI-Group/evomo.
Related papers
- GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III [13.487945730611193]
We propose a fully tensorized implementation of NSGA-III for large-scale many-objective optimization.
NSGA-III maintains the exact selection and variation mechanisms of NSGA-III while achieving significant acceleration.
Results show thatNSGA-III achieves speedups of up to $3629times$ over the CPU version of NSGA-III.
arXiv Detail & Related papers (2025-04-08T14:09:23Z) - AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs [68.99086112477565]
Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation.
Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads.
We propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single- GPU and multi- GPU environments.
arXiv Detail & Related papers (2025-02-27T14:46:22Z) - Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations [4.787389127632926]
We introduce a fast, sample-efficient ME based algorithm capable of scaling up with massive parallelization.<n>Our experiments show that ASCII-ME can generate a diverse collection of high-performing deep neural network policies in less than 250 seconds on a single GPU.
arXiv Detail & Related papers (2025-01-30T19:56:04Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration [6.784939343811732]
The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution.
This paper introduces a tensorization method for the NEAT algorithm, enabling the transformation of its diverse network topologies.
NEAT library supports various benchmark environments including Gym, Brax, and gymnax.
arXiv Detail & Related papers (2024-04-02T10:20:12Z) - GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA [13.319536515278191]
We introduce a large-scale Evolutionary Reference Vector Guided Algorithm (TensorRVEA) for harnessing the advancements of the GPU acceleration.
In numerical benchmark tests involving large-scale populations and problem dimensions,RVEA consistently demonstrates high computational performance, achieving up to over 1000$times$ speedups.
arXiv Detail & Related papers (2024-04-01T15:04:24Z) - Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [51.89707241449435]
In this paper, we address the challenge of integrating multi-head self-attention into high-resolution representation CNNs efficiently.
We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.
We present a series of models via the Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searches for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
with Task-level Sparsity via Mixture-of-Experts [60.1586169973792]
M$3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE)
MoE achieves better accuracy and over 80% reduction computation but leaves challenges for efficient deployment on FPGA.
Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations.
arXiv Detail & Related papers (2023-05-30T02:24:03Z) - EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary
Computation [40.71953374838183]
EvoX is a computing framework tailored for automated, distributed, and heterogeneous execution of EC algorithms.
At the core of EvoX lies a unique programming model to streamline the development of parallelizable EC algorithms.
EvoX offers comprehensive support for a diverse set of benchmark problems, ranging from dozens of numerical test functions to hundreds of reinforcement learning tasks.
arXiv Detail & Related papers (2023-01-29T15:00:16Z) - Training Diverse High-Dimensional Controllers by Scaling Covariance
Matrix Adaptation MAP-Annealing [12.90845054806193]
Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks.
CMA-MAE, an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks.
We propose three new CMA-MAE variants that scale to high dimensions.
arXiv Detail & Related papers (2022-10-06T01:03:01Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Towards making the most of NLP-based device mapping optimization for
OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels.
We propose four different models that provide enhanced contextual information of source codes.
Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.