Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix
Multiplication
- URL: http://arxiv.org/abs/2106.10499v1
- Date: Sat, 19 Jun 2021 13:53:58 GMT
- Title: Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix
Multiplication
- Authors: Gordon E. Moon, Hyoukjun Kwon, Geonhwa Jeong, Prasanth Chatarasi,
Sivasankaran Rajamanickam, Tushar Krishna
- Abstract summary: We develop a framework that finds optimized mappings for a tiled GEMM for a given spatial accelerator and workload combination.
Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance.
- Score: 4.878665155352402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing interest in custom spatial accelerators for machine
learning applications. These accelerators employ a spatial array of processing
elements (PEs) interacting via custom buffer hierarchies and networks-on-chip.
The efficiency of these accelerators comes from employing optimized dataflow
(i.e., spatial/temporal partitioning of data across the PEs and fine-grained
scheduling) strategies to optimize data reuse. The focus of this work is to
evaluate these accelerator architectures using a tiled general matrix-matrix
multiplication (GEMM) kernel. To do so, we develop a framework that finds
optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given
spatial accelerator and workload combination, leveraging an analytical cost
model for runtime and energy. Our evaluations over five spatial accelerators
demonstrate that the tiled GEMM mappings systematically generated by our
framework achieve high performance on various GEMM workloads and accelerators.
Related papers
- Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication [0.8363939984237685]
Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in scientific computing, graph analytics, and deep learning.
Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes.
This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks.
arXiv Detail & Related papers (2024-06-14T16:36:35Z) - Accelerator-driven Data Arrangement to Minimize Transformers Run-time on
Multi-core Architectures [5.46396577345121]
complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption.
We propose a novel memory arrangement strategy, governed by the hardware accelerator's kernel size, which effectively minimizes off-chip data access.
Our approach can achieve up to a 2.8x speed increase when executing inferences employing state-of-the-art transformers.
arXiv Detail & Related papers (2023-12-20T13:01:25Z) - Incremental Multimodal Surface Mapping via Self-Organizing Gaussian
Mixture Models [1.0878040851638]
This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model.
The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment.
To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud.
arXiv Detail & Related papers (2023-09-19T19:49:03Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators [0.0]
Bifrost is an end-to-end framework for the evaluation and optimization of reconfigurable inference accelerators.
We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost.
arXiv Detail & Related papers (2022-04-26T16:22:24Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor
Operations on Spatial Accelerators [4.055002321981825]
We present a HW-SW co-design ecosystem for spatial accelerators called Union.
Our framework allows exploring different algorithms and their mappings on several accelerator cost models.
We demonstrate the value of Union for the community with several case studies.
arXiv Detail & Related papers (2021-09-15T16:42:18Z) - DHA: End-to-End Joint Optimization of Data Augmentation Policy,
Hyper-parameter and Architecture [81.82173855071312]
We propose an end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search.
Dha achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space.
arXiv Detail & Related papers (2021-09-13T08:12:50Z) - Learning Space Partitions for Path Planning [54.475949279050596]
PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima.
These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
arXiv Detail & Related papers (2021-06-19T18:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.