Related papers: OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers

OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers

URL: http://arxiv.org/abs/2404.15833v1
Date: Wed, 24 Apr 2024 12:11:33 GMT
Title: OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers
Authors: Christian Heidorn, Frank Hannig, Dominik Riedelbauch, Christoph Strohmeyer, Jürgen Teich,
Abstract summary: AURIX 2xx and 3xx families of TriCore microcontrollers are widely used in the automotive industry. OpTC is an end-to-end toolchain for automatic compression, conversion, code generation, and deployment of neural networks on TC3xx microcontrollers.
Score: 2.051709733623628
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The AURIX 2xx and 3xx families of TriCore microcontrollers are widely used in the automotive industry and, recently, also in applications that involve machine learning tasks. Yet, these applications are mainly engineered manually, and only little tool support exists for bringing neural networks to TriCore microcontrollers. Thus, we propose OpTC, an end-to-end toolchain for automatic compression, conversion, code generation, and deployment of neural networks on TC3xx microcontrollers. OpTC supports various types of neural networks and provides compression using layer-wise pruning based on sensitivity analysis for a given neural network. The flexibility in supporting different types of neural networks, such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN), is shown in case studies for a TC387 microcontroller. Automotive applications for predicting the temperature in electric motors and detecting anomalies are thereby used to demonstrate the effectiveness and the wide range of applications supported by OpTC.

Related papers

AI-ANNE: (A) (N)eural (N)et for (E)xploration: Transferring Deep Learning Models onto Microcontrollers and Embedded Systems [0.0]
This working paper explores the integration of neural networks onto resource-constrained embedded systems like a Raspberry Pi Pico / Raspberry Pi Pico 2. A TinyML aproach transfers neural networks directly on these microcontrollers, enabling real-time, low-latency, and energy-efficient inference. Two different neural networks on microcontrollers are presented for an example of data classification.
arXiv Detail & Related papers (2025-01-01T10:29:55Z)
Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks. embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption. split computing - where an SNN is partitioned across two devices - is a promising solution. This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
Adaptive Robotic Arm Control with a Spiking Recurrent Neural Network on a Digital Accelerator [41.60361484397962]
We present an overview of the system, and a Python framework to use it on a Pynq ZU platform. We show how the simulated accuracy is preserved with a peak performance of 3.8M events processed per second.
arXiv Detail & Related papers (2024-05-21T14:59:39Z)
NNCTC: Physical Layer Cross-Technology Communication via Neural Networks [5.316403200445237]
Cross-technology communication enables seamless interactions between diverse wireless technologies. We present NNCTC, a Neural-Network-based Cross-Technology Communication framework inspired by the adaptability of trainable neural models in wireless communications.
arXiv Detail & Related papers (2024-03-15T04:36:44Z)
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators [57.145175475579315]
This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search.
arXiv Detail & Related papers (2023-12-15T00:22:55Z)
Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions. We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis. We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
A Microarchitecture Implementation Framework for Online Learning with Temporal Neural Networks [1.4530235554268331]
Temporal Neural Networks (TNNs) are spiking neural networks that use time as a resource to represent and process information. This work proposes a microarchitecture framework for implementing TNNs using standard CMOS.
arXiv Detail & Related papers (2021-05-27T15:59:54Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference. We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z)
A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation [3.5217810503607896]
We design and architect A3, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.
arXiv Detail & Related papers (2020-02-22T02:09:21Z)
Convolutional-Recurrent Neural Networks on Low-Power Wearable Platforms for Cardiac Arrhythmia Detection [0.18459705687628122]
We focus on the inference of neural networks running in microcontrollers and low-power processors. We adapted an existing convolutional-recurrent neural network to detect and classify cardiac arrhythmias. We show our implementation in fixed-point precision, using the CMSIS-NN libraries, with a memory footprint of 195.6KB, and a throughput of 33.98MOps/s.
arXiv Detail & Related papers (2020-01-08T10:35:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.