Related papers: Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing

Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing

URL: http://arxiv.org/abs/2601.22151v1
Date: Thu, 29 Jan 2026 18:59:50 GMT
Title: Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing
Authors: Daniel Stein, Shaoyi Huang, Rolf Drechsler, Bing Li, Grace Li Zhang,
Abstract summary: State-of-the-art research still focuses on efficiently executing enormous numbers of multiply-accumulate (MAC) operations.<n>We propose to convert neural networks into logic flows for execution.<n>Results demonstrate that the latency can be reduced by up to 14.9 % on a simulated RISC-V CPU.
Score: 8.89228491380837
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural networks have been successfully applied in various resource-constrained edge devices, where usually central processing units (CPUs) instead of graphics processing units exist due to limited power availability. State-of-the-art research still focuses on efficiently executing enormous numbers of multiply-accumulate (MAC) operations. However, CPUs themselves are not good at executing such mathematical operations on a large scale, since they are more suited to execute control flow logic, i.e., computer algorithms. To enhance the computation efficiency of neural networks on CPUs, in this paper, we propose to convert them into logic flows for execution. Specifically, neural networks are first converted into equivalent decision trees, from which decision paths with constant leaves are then selected and compressed into logic flows. Such logic flows consist of if and else structures and a reduced number of MAC operations. Experimental results demonstrate that the latency can be reduced by up to 14.9 % on a simulated RISC-V CPU without any accuracy degradation. The code is open source at https://github.com/TUDa-HWAI/NN2Logic

Related papers

Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly. We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z)
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration [7.694043781601237]
We propose a novel digital multiply-accumulate (MAC) design based on encoding. In this new design, the multipliers are replaced by simple logic gates to represent the results with a wide bit representation. Since the multiplication function is replaced by a simple logic representation, the critical paths in the resulting circuits become much shorter.
arXiv Detail & Related papers (2024-02-25T09:35:30Z)
Logic Design of Neural Networks for High-Throughput and Low-Power Applications [4.964773661192363]
We propose to flatten and implement all the operations at neurons, e.g., MAC and ReLU, in a neural network with their corresponding logic circuits. The weight values are embedded into the MAC units to simplify the logic, which can reduce the delay of the MAC units and the power consumption incurred by weight movement. In addition, we propose a hardware-aware training method to reduce the area of logic designs of neural networks.
arXiv Detail & Related papers (2023-09-19T10:45:46Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
Fast matrix multiplication for binary and ternary CNNs on ARM CPU [0.9135092203041721]
We propose fast algorithms of ternary, ternary-binary, and binary matrix multiplication for mobile devices with ARM architecture. Our algorithms can be used to implement inference of convolutional and fully connected layers of TNNs, TBNs, and BNNs. We evaluate them experimentally on ARM Cortex-A73 CPU and compare their inference speed to efficient implementations of full-precision, 8-bit, and 4-bit quantized matrix multiplications.
arXiv Detail & Related papers (2022-05-18T14:52:34Z)
AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs. AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z)
Neural network relief: a pruning algorithm based on neural activity [47.57448823030151]
We propose a simple importance-score metric that deactivates unimportant connections. We achieve comparable performance for LeNet architectures on MNIST. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations.
arXiv Detail & Related papers (2021-09-22T15:33:49Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs [6.035819238203187]
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance. We also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM.
arXiv Detail & Related papers (2020-02-20T12:07:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.