Model-to-Circuit Cross-Approximation For Printed Machine Learning
Classifiers
- URL: http://arxiv.org/abs/2303.08255v1
- Date: Tue, 14 Mar 2023 22:11:34 GMT
- Title: Model-to-Circuit Cross-Approximation For Printed Machine Learning
Classifiers
- Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B.
Tahoori, J\"org Henkel
- Abstract summary: Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs.
Large feature sizes in PE prohibit the realization of complex ML models in PE, even with bespoke architectures.
We present an automated, cross-layer approximation framework tailored to bespoke architectures that enable complex ML models in PE.
- Score: 4.865819809855699
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Printed electronics (PE) promises on-demand fabrication, low non-recurring
engineering costs, and sub-cent fabrication costs. It also allows for high
customization that would be infeasible in silicon, and bespoke architectures
prevail to improve the efficiency of emerging PE machine learning (ML)
applications. Nevertheless, large feature sizes in PE prohibit the realization
of complex ML models in PE, even with bespoke architectures. In this work, we
present an automated, cross-layer approximation framework tailored to bespoke
architectures that enable complex ML models, such as Multi-Layer Perceptrons
(MLPs) and Support Vector Machines (SVMs), in PE. Our framework adopts
cooperatively a hardware-driven coefficient approximation of the ML model at
algorithmic level, a netlist pruning at logic level, and a voltage over-scaling
at the circuit level. Extensive experimental evaluation on 12 MLPs and 12 SVMs
and more than 6000 approximate and exact designs demonstrates that our
model-to-circuit cross-approximation delivers power and area optimal designs
that, compared to the state-of-the-art exact designs, feature on average 51%
and 66% area and power reduction, respectively, for less than 5% accuracy loss.
Finally, we demonstrate that our framework enables 80% of the examined
classifiers to be battery-powered with almost identical accuracy with the exact
designs, paving thus the way towards smart complex printed applications.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Embedding Hardware Approximations in Discrete Genetic-based Training for
Printed MLPs [1.6052247221616553]
Printed Electronics (PE) enables stretchable, conformal,and non-toxic hardware.
PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML)aware circuits.
In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the training process.
arXiv Detail & Related papers (2024-02-05T11:52:23Z) - Bespoke Approximation of Multiplication-Accumulation and Activation
Targeting Printed Multilayer Perceptrons [0.8768075668637361]
Printed Electronics (PE) offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing costs, and on-demand fabrication.
PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits.
We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers.
arXiv Detail & Related papers (2023-12-29T14:16:11Z) - Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z) - Co-Design of Approximate Multilayer Perceptron for Ultra-Resource
Constrained Printed Circuits [4.865819809855699]
Large feature sizes in Printed Electronics (PE) prohibit the realization of complex printed machine learning circuits.
We present, for the first time, an automated printed-aware software/hardware co-design framework that exploits approximate computing principles to enable ultra-resource constrained printed multilayer perceptrons (MLPs)
Our evaluation demonstrates that, compared to the state-of-the-art baseline, our circuits feature on average 6x (5.7x) lower area (power) and less than 1% accuracy loss.
arXiv Detail & Related papers (2023-02-28T13:55:19Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture.
We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z) - Cross-Layer Approximation For Printed Machine Learning Circuits [4.865819809855699]
We propose and implement a cross-layer approximation, tailored for bespoke machine learning (ML) architectures in printed electronics (PE)
Our results demonstrate that our cross approximation delivers optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss.
arXiv Detail & Related papers (2022-03-11T13:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.