Cross-Layer Approximation For Printed Machine Learning Circuits
- URL: http://arxiv.org/abs/2203.05915v1
- Date: Fri, 11 Mar 2022 13:41:15 GMT
- Title: Cross-Layer Approximation For Printed Machine Learning Circuits
- Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B.
Tahoori, J\"org Henkel
- Abstract summary: We propose and implement a cross-layer approximation, tailored for bespoke machine learning (ML) architectures in printed electronics (PE)
Our results demonstrate that our cross approximation delivers optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss.
- Score: 4.865819809855699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Printed electronics (PE) feature low non-recurring engineering costs and low
per unit-area fabrication costs, enabling thus extremely low-cost and on-demand
hardware. Such low-cost fabrication allows for high customization that would be
infeasible in silicon, and bespoke architectures prevail to improve the
efficiency of emerging PE machine learning (ML) applications. However, even
with bespoke architectures, the large feature sizes in PE constraint the
complexity of the ML models that can be implemented. In this work, we bring
together, for the first time, approximate computing and PE design targeting to
enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support
Vector Machines (SVMs), in PE. To this end, we propose and implement a
cross-layer approximation, tailored for bespoke ML architectures. At the
algorithmic level we apply a hardware-driven coefficient approximation of the
ML model and at the circuit level we apply a netlist pruning through a full
search exploration. In our extensive experimental evaluation we consider 14
MLPs and SVMs and evaluate more than 4300 approximate and exact designs. Our
results demonstrate that our cross approximation delivers Pareto optimal
designs that, compared to the state-of-the-art exact designs, feature 47% and
44% average area and power reduction, respectively, and less than 1% accuracy
loss.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Embedding Hardware Approximations in Discrete Genetic-based Training for
Printed MLPs [1.6052247221616553]
Printed Electronics (PE) enables stretchable, conformal,and non-toxic hardware.
PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML)aware circuits.
In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the training process.
arXiv Detail & Related papers (2024-02-05T11:52:23Z) - Bespoke Approximation of Multiplication-Accumulation and Activation
Targeting Printed Multilayer Perceptrons [0.8768075668637361]
Printed Electronics (PE) offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing costs, and on-demand fabrication.
PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits.
We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers.
arXiv Detail & Related papers (2023-12-29T14:16:11Z) - Model-to-Circuit Cross-Approximation For Printed Machine Learning
Classifiers [4.865819809855699]
Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs.
Large feature sizes in PE prohibit the realization of complex ML models in PE, even with bespoke architectures.
We present an automated, cross-layer approximation framework tailored to bespoke architectures that enable complex ML models in PE.
arXiv Detail & Related papers (2023-03-14T22:11:34Z) - Co-Design of Approximate Multilayer Perceptron for Ultra-Resource
Constrained Printed Circuits [4.865819809855699]
Large feature sizes in Printed Electronics (PE) prohibit the realization of complex printed machine learning circuits.
We present, for the first time, an automated printed-aware software/hardware co-design framework that exploits approximate computing principles to enable ultra-resource constrained printed multilayer perceptrons (MLPs)
Our evaluation demonstrates that, compared to the state-of-the-art baseline, our circuits feature on average 6x (5.7x) lower area (power) and less than 1% accuracy loss.
arXiv Detail & Related papers (2023-02-28T13:55:19Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.