Related papers: SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy

SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy

URL: http://arxiv.org/abs/2508.12906v1
Date: Mon, 18 Aug 2025 13:13:30 GMT
Title: SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Authors: Boran Zhao, Haiming Zhai, Zihang Yuan, Hetian Liu, Tian Xia, Wenzhe Zhao, Pengju Ren,
Abstract summary: The demand for sparse computation algebra (SpTA) in machine learning and big data has driven the development of various sparse accelerators.<n>Previous works focus solely on either mapping (i.e., tensor communication and tiling in space and time) or sparse strategy.<n>We propose an evolution strategy-based sparse accelerator optimization framework, called SparseMap.
Score: 5.687126431324017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios, and it's time-consuming and challenging to adjust a large number of design factors when scenarios change. Therefore, automating the design of SpTA accelerators is crucial. Nevertheless, previous works focus solely on either mapping (i.e., tiling communication and computation in space and time) or sparse strategy (i.e., bypassing zero elements for efficiency), leading to suboptimal designs due to the lack of comprehensive consideration of both. A unified framework that jointly optimizes both is urgently needed. However, integrating mapping and sparse strategies leads to a combinatorial explosion in the design space(e.g., as large as $O(10^{41})$ for the workload $P_{32 \times 64} \times Q_{64 \times 48} = Z_{32 \times 48}$). This vast search space renders most conventional optimization methods (e.g., particle swarm optimization, reinforcement learning and Monte Carlo tree search) inefficient. To address this challenge, we propose an evolution strategy-based sparse tensor accelerator optimization framework, called SparseMap. SparseMap constructing a more comprehensive design space with the consideration of both mapping and sparse strategy. We introduce a series of enhancements to genetic encoding and evolutionary operators, enabling SparseMap to efficiently explore the vast and diverse design space. We quantitatively compare SparseMap with prior works and classical optimization methods, demonstrating that SparseMap consistently finds superior solutions.

Related papers

FastMap: Revisiting Dense and Scalable Structure from Motion [26.930994695116198]
We propose FastMap, a new global structure from motion method focused on speed and simplicity.<n>Previous methods like COLMAP and GLOMAP suffer from poor scalability when the number of matched keypoint pairs becomes large.<n>We show that FastMap is faster than COLMAP and GLOMAP on large-scale scenes with comparable pose accuracy.
arXiv Detail & Related papers (2025-05-07T17:56:15Z)
Integrated Hardware Architecture and Device Placement Search [7.620610652090732]
Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal architecture and device placement strategy. Our approach achieves higher throughput on large language models compared to the state-of-the-art TPUv4 and the Spotlight accelerator search framework.
arXiv Detail & Related papers (2024-07-18T04:02:35Z)
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout [8.99065455675796]
We propose ROAM which operates on graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup.
arXiv Detail & Related papers (2023-10-30T06:29:21Z)
Efficient Map Sparsification Based on 2D and 3D Discretized Grids [47.22997560184043]
As a map grows larger, more memory is required and localization becomes inefficient. Previous map sparsification methods add a quadratic term in mixed-integer programming to enforce a uniform distribution of selected landmarks. In this paper, we formulate map sparsification in an efficient linear form and select uniformly distributed landmarks based on 2D discretized grids.
arXiv Detail & Related papers (2023-03-20T05:49:14Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
An Adaptive and Scalable ANN-based Model-Order-Reduction Method for Large-Scale TO Designs [22.35243726859667]
Topology Optimization (TO) provides a systematic approach for obtaining structure design with optimum performance of interest. Deep learning-based models have been developed to accelerate the process. MapNet is a neural network which maps the field of interest from coarse-scale to fine-scale.
arXiv Detail & Related papers (2022-03-20T10:12:24Z)
An Optimal Transport Perspective on Unpaired Image Super-Resolution [97.24140709634203]
Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques.<n>We investigate optimization problems which arise in such models and find two surprising observations.<n>We prove and empirically show that the learned map is biased, i.e., it does not actually transform the distribution of low-resolution images to high-resolution ones.
arXiv Detail & Related papers (2022-02-02T16:21:20Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Transferable Graph Optimizers for ML Compilers [18.353830282858834]
We propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO) GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. GO achieves 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence.
arXiv Detail & Related papers (2020-10-21T20:28:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.