Related papers: MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications

MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications

URL: http://arxiv.org/abs/2403.10538v1
Date: Sun, 3 Mar 2024 10:31:46 GMT
Title: MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications
Authors: Tousif Rahman, Gang Mao, Sidharth Maheshwari, Rishad Shafik, Alex Yakovlev,
Abstract summary: This paper presents MATADOR, an automated-to-silicon tool with GUI interface capable of optimized accelerator design for inference at the edge. It offers automation of the full development pipeline: model training, system level design generation, design verification and deployment. MATADOR accelerator designs are shown to be up to 13.4x faster, up to 7x more resource frugal and up to 2x more power efficient when compared to state-of-the-art Quantized and Binary Deep Neural Network implementations.
Score: 0.2663045001864042
License: http://creativecommons.org/licenses/by/4.0/
Abstract: System-on-Chip Field-Programmable Gate Arrays (SoC-FPGAs) offer significant throughput gains for machine learning (ML) edge inference applications via the design of co-processor accelerator systems. However, the design effort for training and translating ML models into SoC-FPGA solutions can be substantial and requires specialist knowledge aware trade-offs between model performance, power consumption, latency and resource utilization. Contrary to other ML algorithms, Tsetlin Machine (TM) performs classification by forming logic proposition between boolean actions from the Tsetlin Automata (the learning elements) and boolean input features. A trained TM model, usually, exhibits high sparsity and considerable overlapping of these logic propositions both within and among the classes. The model, thus, can be translated to RTL-level design using a miniscule number of AND and NOT gates. This paper presents MATADOR, an automated boolean-to-silicon tool with GUI interface capable of implementing optimized accelerator design of the TM model onto SoC-FPGA for inference at the edge. It offers automation of the full development pipeline: model training, system level design generation, design verification and deployment. It makes use of the logic sharing that ensues from propositional overlap and creates a compact design by effectively utilizing the TM model's sparsity. MATADOR accelerator designs are shown to be up to 13.4x faster, up to 7x more resource frugal and up to 2x more power efficient when compared to the state-of-the-art Quantized and Binary Deep Neural Network implementations.

Related papers

Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs [0.3440236962613469]
This paper presents a Dynamic Tsetlin Machine (DTM) training accelerator as an alternative to Deep Neural Networks (DNNs) DTM trains with fewer multiply-accumulates, devoid of derivative computation. The proposed accelerator offers 2.54x more Giga operations per second per Watt (GOP/s per W) and uses 6x less power than the next-best comparable design.
arXiv Detail & Related papers (2025-04-28T13:38:53Z)
Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs [0.2294388534633318]
eFPGAs allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget. The limited eFPGA logic and memory significantly constrain compute capabilities and model size. The proposed eFPGA accelerator focuses on minimizing resource usage and allowing flexibility for on-field recalibration over throughput.
arXiv Detail & Related papers (2025-02-10T12:49:22Z)
KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs [5.090653251547252]
This paper introduces KALAM, which leverages factor graphs as the foundational paradigm for MP-based analog computing systems. Using Python scripting language, the KALAM automation flow translates an input factor graph to its equivalent SPICE-compatible circuit netlist. We demonstrate KALAM's versatility for tasks such as Bayesian inference, Low-Density Parity Check (LDPC) decoding, and Artificial Neural Networks (ANN)
arXiv Detail & Related papers (2024-10-30T12:04:22Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs [3.302913401404089]
Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens. We propose a dataflow-aware FPGA-based accelerator design, SWAT, that efficiently leverages the sparsity to achieve scalable performance for long input.
arXiv Detail & Related papers (2024-05-27T10:25:08Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
Hybrid Graph Models for Logic Optimization via Spatio-Temporal Information [15.850413267830522]
Two major concerns that may impede production-ready ML applications in EDA are accuracy requirements and generalization capability. We propose hybrid graph neural network (GNN) based approaches towards highly accurate quality-of-result (QoR) estimations. Evaluation on 3.3 million data points shows that the testing mean absolute percentage error (MAPE) on designs seen unseen during training are no more than 1.2% and 3.1%.
arXiv Detail & Related papers (2022-01-20T21:12:22Z)
HALF: Holistic Auto Machine Learning for FPGAs [1.9146960682777232]
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered. An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly.
arXiv Detail & Related papers (2021-06-28T14:45:47Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.