Related papers: Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units

Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units

URL: http://arxiv.org/abs/2404.10730v1
Date: Tue, 16 Apr 2024 17:02:52 GMT
Title: Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units
Authors: Hieu Le, Zhenhua He, Mai Le, Dhruva K. Chakravorty, Lisa M. Perez, Akhil Chilumuru, Yan Yao, Jiefu Chen,
Abstract summary: Intelligence Processing Units (IPUs) offer a viable accelerator alternative to GPUs for machine learning (ML) applications. We investigate the process of migrating a model from GPU to IPU and explore several optimization techniques, including pipelining and gradient accumulation. We observe significantly improved performance with the Bow IPU when compared to its predecessor, the Colossus IPU.
Score: 8.782847610934635
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The discoveries in this paper show that Intelligence Processing Units (IPUs) offer a viable accelerator alternative to GPUs for machine learning (ML) applications within the fields of materials science and battery research. We investigate the process of migrating a model from GPU to IPU and explore several optimization techniques, including pipelining and gradient accumulation, aimed at enhancing the performance of IPU-based models. Furthermore, we have effectively migrated a specialized model to the IPU platform. This model is employed for predicting effective conductivity, a parameter crucial in ion transport processes, which govern the performance of multiple charge and discharge cycles of batteries. The model utilizes a Convolutional Neural Network (CNN) architecture to perform prediction tasks for effective conductivity. The performance of this model on the IPU is found to be comparable to its execution on GPUs. We also analyze the utilization and performance of Graphcore's Bow IPU. Through benchmark tests, we observe significantly improved performance with the Bow IPU when compared to its predecessor, the Colossus IPU.

Related papers

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs [111.69640966866059]
Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models.<n>In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs.<n>The key goals are better usage of the computing resources under the dynamic sparse model structures and materializing the expected performance gain on the actual hardware.
arXiv Detail & Related papers (2025-05-07T15:46:36Z)
Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation [4.573673188291683]
We present xPU-Shark, a fine-grained methodology for analyzing ML models at the machine-code level. xPU-Shark captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator. We optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.
arXiv Detail & Related papers (2025-03-18T23:15:02Z)
Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies [0.0]
The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system.
arXiv Detail & Related papers (2023-10-24T22:09:03Z)
Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units [0.0]
This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. We propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture.
arXiv Detail & Related papers (2023-04-18T19:44:56Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
The Preliminary Results on Analysis of TAIGA-IACT Images Using Convolutional Neural Networks [68.8204255655161]
The aim of the work is to study the possibility of the machine learning application to solve the tasks set for TAIGA-IACT. The method of Convolutional Neural Networks (CNN) was applied to process and analyze Monte-Carlo events simulated with CORSIKA.
arXiv Detail & Related papers (2021-12-19T15:17:20Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Performance and Power Modeling and Prediction Using MuMMI and Ten Machine Learning Methods [0.13764085113103217]
We use modeling and prediction tool MuMMI and ten machine learning methods to model and predict performance and power. Experiment results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases.
arXiv Detail & Related papers (2020-11-12T21:24:11Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
Using Machine Learning to Emulate Agent-Based Simulations [0.0]
We evaluate the performance of multiple machine-learning methods as statistical emulators for use in the analysis of agent-based models (ABMs) We propose that agent-based modelling would benefit from using machine-learning methods for emulation, as this can facilitate more robust sensitivity analyses for the models.
arXiv Detail & Related papers (2020-05-05T11:48:36Z)
Graphcore C2 Card performance for image-based deep learning application: A Report [0.3149883354098941]
Graphcore has introduced an IPU Processor for accelerating machine learning applications. We report on a benchmark in which we have evaluated the performance of IPU processors on deep neural networks for inference.
arXiv Detail & Related papers (2020-02-26T17:58:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.