Related papers: SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design

SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design

URL: http://arxiv.org/abs/2306.15656v3
Date: Tue, 18 Jul 2023 17:52:28 GMT
Title: SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design
Authors: Fu-Ming Guo
Abstract summary: This paper introduces Sparser, a novel deep learning that exploits Moreau-Yosida regularization to induce sparsity in large language models such as BERT, ALBERT and GPT. Sparser's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models. Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SBERT and Sparser, when sparsified using Sparser, achieve performance comparable to their dense counterparts
Score: 0.685316573653194
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper introduces SparseOptimizer, a novel deep learning optimizer that exploits Moreau-Yosida regularization to naturally induce sparsity in large language models such as BERT, ALBERT and GPT. Key to the design of SparseOptimizer is an embedded shrinkage operator, which imparts sparsity directly within the optimization process. This operator, backed by a sound theoretical framework, includes an analytical solution, thereby reinforcing the optimizer's robustness and efficacy. Crucially, SparseOptimizer's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models. Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SparseBERT and SparseALBERT, when sparsified using SparseOptimizer, achieve performance comparable to their dense counterparts, BERT and ALBERT, while significantly reducing their parameter count. Further, this work proposes an innovative optimizer-compiler co-design strategy, demonstrating the potential of inference acceleration (\textbf{3.37x}, \textbf{6.30x}, and \textbf{7.15x} in comparison with Pytorch, TensorFlow, and LLVM generic compile, respectively) in SparseBERT when paired with an appropriately designed compiler. This study represents a significant step forward in the evolution of efficient, scalable, and high-performing large language models, setting a precedent for future exploration and optimization in this domain. The SparseOptimizer code and SparseALBERT model will be publicly available upon paper acceptance.

Related papers

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs [56.24431208419858]
We introduce underlinetextbfDirect Preference Learning with Only underlinetextbfSelf-Generated underlinetextbfTests and underlinetextbfCode (DSTC) DSTC uses only self-generated code snippets and tests to construct reliable preference pairs.
arXiv Detail & Related papers (2024-11-20T02:03:16Z)
RTLRewriter: Methodologies for Large Models aided RTL Code Optimization [21.61206887869307]
This paper introduces RTLRewriter, an innovative framework that leverages large models to optimize RTL code. A circuit partition pipeline is utilized for fast synthesis and efficient rewriting. A specialized search engine is designed to identify useful optimization guides, algorithms, and code snippets.
arXiv Detail & Related papers (2024-09-04T09:59:37Z)
Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies. This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers.
arXiv Detail & Related papers (2024-06-17T23:26:41Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Large Language Models for Compiler Optimization [22.52765975286403]
We present a transformer model trained from scratch to optimize LLVM assembly for code size. We ask the model to predict the instruction counts before and after optimization, and the optimized code itself. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler.
arXiv Detail & Related papers (2023-09-11T22:11:46Z)
Robust Prompt Optimization for Large Language Models Against Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. We propose a new problem of robust prompt optimization for LLMs against distribution shifts. This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
LinEasyBO: Scalable Bayesian Optimization Approach for Analog Circuit Synthesis via One-Dimensional Subspaces [11.64233949999656]
We propose a fast and robust Bayesian optimization approach via one-dimensional subspaces for analog circuit synthesis. Our proposed algorithm can accelerate the optimization procedure by up to 9x and 38x compared to LP-EI and REMBOpBO respectively when the batch size is 15.
arXiv Detail & Related papers (2021-09-01T21:25:25Z)
Additive Tree-Structured Conditional Parameter Spaces in Bayesian Optimization: A Novel Covariance Function and a Fast Implementation [34.89735938765757]
We generalize the additive assumption to tree-structured functions, showing improved sample-efficiency, wider applicability and greater flexibility. By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function. We demonstrate our method on an optimization benchmark function, on pruning pre-trained VGG16 and Res50 models as well as on searching activation functions of ResNet20.
arXiv Detail & Related papers (2020-10-06T16:08:58Z)
Additive Tree-Structured Covariance Function for Conditional Parameter Spaces in Bayesian Optimization [34.89735938765757]
We generalize the additive assumption to tree-structured functions. By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
arXiv Detail & Related papers (2020-06-21T11:21:55Z)
On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond [82.18770740564642]
Variational autoencoders (VAEs) combine latent variables with amortized variational inference. We observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold. We propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure.
arXiv Detail & Related papers (2020-04-20T10:34:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.