Related papers: On Importance of Pruning and Distillation for Efficient Low Resource NLP

On Importance of Pruning and Distillation for Efficient Low Resource NLP

URL: http://arxiv.org/abs/2409.14162v1
Date: Sat, 21 Sep 2024 14:58:12 GMT
Title: On Importance of Pruning and Distillation for Efficient Low Resource NLP
Authors: Aishwarya Mirashi, Purva Lingayat, Srushti Sonavane, Tejas Padhiyar, Raviraj Joshi, Geetanjali Kale,
Abstract summary: Large transformer models have revolutionized Natural Language Processing, leading to significant advances in tasks like text classification. Efforts have been made to downsize and accelerate English models, but research in this area is scarce for low-resource languages. In this study, we explore the case of the low-resource-topic-all-docv2 model as our baseline, we implement optimization techniques to reduce computation time and memory usage.
Score: 0.3958317527488535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise of large transformer models has revolutionized Natural Language Processing, leading to significant advances in tasks like text classification. However, this progress demands substantial computational resources, escalating training duration, and expenses with larger model sizes. Efforts have been made to downsize and accelerate English models (e.g., Distilbert, MobileBert). Yet, research in this area is scarce for low-resource languages. In this study, we explore the case of the low-resource Indic language Marathi. Leveraging the marathi-topic-all-doc-v2 model as our baseline, we implement optimization techniques to reduce computation time and memory usage. Our focus is on enhancing the efficiency of Marathi transformer models while maintaining top-tier accuracy and reducing computational demands. Using the MahaNews document classification dataset and the marathi-topic-all-doc-v2 model from L3Cube, we apply Block Movement Pruning, Knowledge Distillation, and Mixed Precision methods individually and in combination to boost efficiency. We demonstrate the importance of strategic pruning levels in achieving desired efficiency gains. Furthermore, we analyze the balance between efficiency improvements and environmental impact, highlighting how optimized model architectures can contribute to a more sustainable computational ecosystem. Implementing these techniques on a single GPU system, we determine that the optimal configuration is 25\% pruning + knowledge distillation. This approach yielded a 2.56x speedup in computation time while maintaining baseline accuracy levels.

Related papers

Automated Tomato Maturity Estimation Using an Optimized Residual Model with Pruning and Quantization Techniques [1.123910458133809]
Tomato maturity plays a pivotal role in optimizing harvest timing and ensuring product quality. Existing deep learning approaches, while accurate, are often too computationally demanding for practical use in resource-constrained agricultural settings. This study aims to develop a computationally efficient tomato classification model using the ResNet-18 architecture optimized through transfer learning, pruning, and quantization techniques.
arXiv Detail & Related papers (2025-03-13T22:56:19Z)
Transformer^-1: Input-Adaptive Computation for Resource-Constrained Deployment [3.6219999155937113]
This paper proposes a Transformer$-1$ architecture to address the resource waste caused by fixed computation paradigms in deep learning models under dynamic scenarios. In a benchmark test, our method reduces FLOPs by 42.7% and peak memory usage by 3% compared to the standard Transformer. We also conducted experiments on several natural language processing tasks and achieved significant improvements in resource efficiency.
arXiv Detail & Related papers (2025-01-26T15:31:45Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities. LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands. We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z)
ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation [4.77407121905745]
Back-propagation (BP) is a major source of computational expense during training deep learning models. We propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture.
arXiv Detail & Related papers (2024-08-22T17:22:59Z)
Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting. This paper proposes a simple but effective edge-friendly incremental learning framework. Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z)
REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical. It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance. We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z)
The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection [18.683805940232485]
We introduce a novel method that employs core subset selection for reweighting. By focusing on a strategically selected coreset, our approach offers a robust representation. The re-calibrated weights are then mapped back to and propagated across the entire dataset.
arXiv Detail & Related papers (2024-03-18T18:30:22Z)
Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation [6.576180048533476]
PaSeR (Parsimonious with Reinforcement Learning) is a non-cascading, cost-aware learning pipeline. We show that PaSeR achieves better accuracy while minimizing computational cost relative to cascaded models. We introduce a new metric IoU/GigaFlop to evaluate the balance between cost and performance.
arXiv Detail & Related papers (2024-02-19T01:17:52Z)
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning [0.0]
We present an innovative approach to self-supervised learning for Vision Transformers (ViTs) This method focuses on enhancing the efficiency and speed of initial layer training in ViTs. Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers.
arXiv Detail & Related papers (2023-12-02T11:10:09Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server [64.94942635929284]
Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency. We propose a novel FL framework, FedDUAP, to exploit the insensitive data on the server and the decentralized data in edge devices. By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller)
arXiv Detail & Related papers (2022-04-25T10:00:00Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.