Related papers: Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification

URL: http://arxiv.org/abs/2502.16627v3
Date: Sat, 15 Mar 2025 03:46:53 GMT
Title: Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
Authors: Arshia Kermani, Ehsan Zeraatkar, Habib Irani,
Abstract summary: This study presents a systematic investigation of optimization techniques, focusing on structured pruning and quantization methods for transformer architectures.<n>Our experimental results demonstrate that static quantization reduces energy consumption by 29.14% while maintaining classification performance, and L1 pruning achieves a 63% improvement in inference speed with minimal accuracy degradation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing computational demands of transformer models in time series classification necessitate effective optimization strategies for energy-efficient deployment. Our study presents a systematic investigation of optimization techniques, focusing on structured pruning and quantization methods for transformer architectures. Through extensive experimentation on three distinct datasets (RefrigerationDevices, ElectricDevices, and PLAID), we quantitatively evaluate model performance and energy efficiency across different transformer configurations. Our experimental results demonstrate that static quantization reduces energy consumption by 29.14% while maintaining classification performance, and L1 pruning achieves a 63% improvement in inference speed with minimal accuracy degradation. Our findings provide valuable insights into the effectiveness of optimization strategies for transformer-based time series classification, establishing a foundation for efficient model deployment in resource-constrained environments.

Related papers

Pruning-Based TinyML Optimization of Machine Learning Models for Anomaly Detection in Electric Vehicle Charging Infrastructure [8.29566258132752]
This paper investigates a pruning method for anomaly detection in resource-constrained environments, specifically targeting EVCI. optimized models achieved significant reductions in model size and inference times, with only a marginal impact on their performance. Notably, our findings indicate that, in the context of EVCI, pruning and FS can enhance computational efficiency while retaining critical anomaly detection capabilities.
arXiv Detail & Related papers (2025-03-19T00:18:37Z)
Transformer^-1: Input-Adaptive Computation for Resource-Constrained Deployment [3.6219999155937113]
This paper proposes a Transformer$-1$ architecture to address the resource waste caused by fixed computation paradigms in deep learning models under dynamic scenarios.<n>In a benchmark test, our method reduces FLOPs by 42.7% and peak memory usage by 3% compared to the standard Transformer.<n>We also conducted experiments on several natural language processing tasks and achieved significant improvements in resource efficiency.
arXiv Detail & Related papers (2025-01-26T15:31:45Z)
Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models [0.0]
This study explores optimization techniques, including Quantization, Knowledge Distillation, and Pruning.<n>4-bit Quantization significantly reduces energy use with minimal accuracy loss.<n>Hybrid approaches, like NVIDIA's Minitron approach combining KD and Structured Pruning, further demonstrate promising trade-offs between size reduction and accuracy retention.
arXiv Detail & Related papers (2025-01-16T08:54:44Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation. deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency. This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z)
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers [7.89533262149443]
Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity.<n>Our benchmark shows that using a larger model in general is more efficient than using higher resolution images.
arXiv Detail & Related papers (2023-08-18T08:06:49Z)
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization [2.0199251985015434]
We introduce Fast Adaptive Moment Estimation (FAME), a novel optimization technique that leverages the power of Triple Exponential Moving Average.<n>FAME enhances responsiveness to data dynamics, mitigates trend identification lag, and optimize learning efficiency.<n>Our comprehensive evaluation encompasses different computer vision tasks including image classification, object detection, and semantic segmentation, integrating FAME into 30 distinct architectures.
arXiv Detail & Related papers (2023-06-02T10:29:33Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Effective Pre-Training Objectives for Transformer-based Autoencoders [97.99741848756302]
We study trade-offs between efficiency, cost and accuracy of Transformer encoders. We combine features of common objectives and create new effective pre-training approaches.
arXiv Detail & Related papers (2022-10-24T18:39:44Z)
Optimizing Inference Performance of Transformers on CPUs [0.0]
Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc. This paper presents an empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs.
arXiv Detail & Related papers (2021-02-12T17:01:35Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.