Related papers: Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

URL: http://arxiv.org/abs/2406.10166v2
Date: Thu, 29 Aug 2024 16:44:17 GMT
Title: Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication
Authors: Sanjali Yadav, Bahar Asgari,
Abstract summary: Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in scientific computing, graph analytics, and deep learning. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks.
Score: 0.8363939984237685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.

Related papers

MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation [0.0]
In practice, analyses are often complicated by missing data.<n>We propose MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets.
arXiv Detail & Related papers (2025-07-29T13:42:38Z)
LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling [52.1366057696919]
LOP is an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint.<n>LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint.<n> Experimental results show that LOP outperforms state-of-the-art pruning methods in various metrics while achieving up to three orders of magnitude speedup.
arXiv Detail & Related papers (2025-06-15T12:14:16Z)
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints [14.341123057506827]
Large Language Models (LLMs) are indispensable in today's applications, but their inference procedure demands significant computational resources. This paper formulates LLM inference optimization as a multi-stage online scheduling problem. We develop a fluid dynamics approximation to provide a tractable benchmark that guides algorithm design.
arXiv Detail & Related papers (2025-04-15T16:00:21Z)
Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection. Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge. Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation [0.0]
Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML)<n>This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD)<n>We show significant speed-ups of up to three orders of magnitude on real-world problems from scientific ML, graph neural networks and optimization.
arXiv Detail & Related papers (2025-01-29T16:21:54Z)
Learning Structured Compressed Sensing with Automatic Resource Allocation [10.298464166235272]
Multidimensional data acquisition often requires extensive time and poses significant challenges for hardware and software. We introduce Structured COmpressed Sensing with Automatic Resource Allocation (SCOSARA) with an information theory-based unsupervised learning strategy.
arXiv Detail & Related papers (2024-10-24T17:53:33Z)
Machine Learning Optimized Approach for Parameter Selection in MESHFREE Simulations [0.0]
Meshfree simulation methods are emerging as compelling alternatives to conventional mesh-based approaches. We provide a comprehensive overview of our research combining Machine Learning (ML) and Fraunhofer's MESHFREE software. We introduce a novel ML-optimized approach, using active learning, regression trees, and visualization on MESHFREE simulation data.
arXiv Detail & Related papers (2024-03-20T15:29:59Z)
Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z)
Enhancing Pattern Classification in Support Vector Machines through Matrix Formulation [0.0]
The reliance on vector-based formulations in existing SVM-based models poses limitations regarding flexibility and ease of incorporating additional terms to handle specific challenges. We introduce a matrix formulation for SVM that effectively addresses these constraints. Experimental evaluations on multilabel and multiclass datasets demonstrate that Matrix SVM achieves superior time efficacy.
arXiv Detail & Related papers (2023-07-18T15:56:39Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model [3.4546761246181696]
We propose a new optimization algorithm on the manifold of Gaussian Mixture Models (GMMs) We observe that methods can outperform the expectation-maximization algorithm in terms of attaining better likelihood, needing fewer epochs for convergence, and consuming less time per each epoch.
arXiv Detail & Related papers (2022-12-11T04:24:52Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues. We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z)
Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time. The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.