Related papers: Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

URL: http://arxiv.org/abs/2402.02930v1
Date: Mon, 5 Feb 2024 11:52:23 GMT
Title: Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs
Authors: Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori
Abstract summary: Printed Electronics (PE) enables stretchable, conformal,and non-toxic hardware. PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML)aware circuits. In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the training process.
Score: 1.6052247221616553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.

Related papers

PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints. PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint. evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits [0.6670927729669428]
We introduce the first sequential Support Vector Machine (SVM) classifiers. Our SVMs yield on average 6x lower area and 4.6% higher accuracy compared to the printed state of the art.
arXiv Detail & Related papers (2025-02-03T16:30:27Z)
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM. DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z)
Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding. PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models. Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z)
SA-MLP: A Low-Power Multiplication-Free Deep Network for 3D Point Cloud Classification in Resource-Constrained Environments [46.266960248570086]
Point cloud classification plays a crucial role in the processing and analysis of data from 3D sensors such as LiDAR. Traditional neural networks, which rely heavily on multiplication operations, often face challenges in terms of high computational costs and energy consumption. This study presents a novel family of efficient multiplication-based architectures designed to improve the computational efficiency of point cloud classification tasks.
arXiv Detail & Related papers (2024-09-03T15:43:44Z)
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric [57.3330687266266]
We find that using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Using the Module-wise Pruning Error (MoPE) metric, we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages.
arXiv Detail & Related papers (2024-03-12T17:24:26Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons [0.8274768545559366]
Printed Electronics (PE) offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing costs, and on-demand fabrication. PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers.
arXiv Detail & Related papers (2023-12-29T14:16:11Z)
Model-to-Circuit Cross-Approximation For Printed Machine Learning Classifiers [4.865819809855699]
Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs. Large feature sizes in PE prohibit the realization of complex ML models in PE, even with bespoke architectures. We present an automated, cross-layer approximation framework tailored to bespoke architectures that enable complex ML models in PE.
arXiv Detail & Related papers (2023-03-14T22:11:34Z)
Co-Design of Approximate Multilayer Perceptron for Ultra-Resource Constrained Printed Circuits [4.865819809855699]
Large feature sizes in Printed Electronics (PE) prohibit the realization of complex printed machine learning circuits. We present, for the first time, an automated printed-aware software/hardware co-design framework that exploits approximate computing principles to enable ultra-resource constrained printed multilayer perceptrons (MLPs) Our evaluation demonstrates that, compared to the state-of-the-art baseline, our circuits feature on average 6x (5.7x) lower area (power) and less than 1% accuracy loss.
arXiv Detail & Related papers (2023-02-28T13:55:19Z)
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks. We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z)
Approximate Decision Trees For Machine Learning Classification on Tiny Printed Circuits [0.7349727826230862]
Printed Electronics (PE) cannot compete with silicon-based systems in conventional evaluation metrics. PE offers attractive properties such as on-demand ultra-low-cost fabrication, flexibility and non-toxicity. Despite the attractive characteristics of PE, the large feature sizes in PE prohibit the realization of complex printed circuits.
arXiv Detail & Related papers (2022-03-15T15:47:59Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
Cross-Layer Approximation For Printed Machine Learning Circuits [4.865819809855699]
We propose and implement a cross-layer approximation, tailored for bespoke machine learning (ML) architectures in printed electronics (PE) Our results demonstrate that our cross approximation delivers optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss.
arXiv Detail & Related papers (2022-03-11T13:41:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.