SparseOptimizer: Sparsify Language Models through Moreau-Yosida
Regularization and Accelerate via Compiler Co-design
- URL: http://arxiv.org/abs/2306.15656v3
- Date: Tue, 18 Jul 2023 17:52:28 GMT
- Title: SparseOptimizer: Sparsify Language Models through Moreau-Yosida
Regularization and Accelerate via Compiler Co-design
- Authors: Fu-Ming Guo
- Abstract summary: This paper introduces Sparser, a novel deep learning that exploits Moreau-Yosida regularization to induce sparsity in large language models such as BERT, ALBERT and GPT.
Sparser's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models.
Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SBERT and Sparser, when sparsified using Sparser, achieve performance comparable to their dense counterparts
- Score: 0.685316573653194
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces SparseOptimizer, a novel deep learning optimizer that
exploits Moreau-Yosida regularization to naturally induce sparsity in large
language models such as BERT, ALBERT and GPT. Key to the design of
SparseOptimizer is an embedded shrinkage operator, which imparts sparsity
directly within the optimization process. This operator, backed by a sound
theoretical framework, includes an analytical solution, thereby reinforcing the
optimizer's robustness and efficacy. Crucially, SparseOptimizer's plug-and-play
functionality eradicates the need for code modifications, making it a
universally adaptable tool for a wide array of large language models. Empirical
evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2
confirm that SparseBERT and SparseALBERT, when sparsified using
SparseOptimizer, achieve performance comparable to their dense counterparts,
BERT and ALBERT, while significantly reducing their parameter count. Further,
this work proposes an innovative optimizer-compiler co-design strategy,
demonstrating the potential of inference acceleration (\textbf{3.37x},
\textbf{6.30x}, and \textbf{7.15x} in comparison with Pytorch, TensorFlow, and
LLVM generic compile, respectively) in SparseBERT when paired with an
appropriately designed compiler. This study represents a significant step
forward in the evolution of efficient, scalable, and high-performing large
language models, setting a precedent for future exploration and optimization in
this domain. The SparseOptimizer code and SparseALBERT model will be publicly
available upon paper acceptance.
Related papers
- DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs [56.24431208419858]
We introduce underlinetextbfDirect Preference Learning with Only underlinetextbfSelf-Generated underlinetextbfTests and underlinetextbfCode (DSTC)
DSTC uses only self-generated code snippets and tests to construct reliable preference pairs.
arXiv Detail & Related papers (2024-11-20T02:03:16Z) - RTLRewriter: Methodologies for Large Models aided RTL Code Optimization [21.61206887869307]
This paper introduces RTLRewriter, an innovative framework that leverages large models to optimize RTL code.
A circuit partition pipeline is utilized for fast synthesis and efficient rewriting.
A specialized search engine is designed to identify useful optimization guides, algorithms, and code snippets.
arXiv Detail & Related papers (2024-09-04T09:59:37Z) - Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies.
This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers.
arXiv Detail & Related papers (2024-06-17T23:26:41Z) - CoLLiE: Collaborative Training of Large Language Models in an Efficient
Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models.
With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Large Language Models for Compiler Optimization [22.52765975286403]
We present a transformer model trained from scratch to optimize LLVM assembly for code size.
We ask the model to predict the instruction counts before and after optimization, and the optimized code itself.
Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler.
arXiv Detail & Related papers (2023-09-11T22:11:46Z) - Robust Prompt Optimization for Large Language Models Against
Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks.
We propose a new problem of robust prompt optimization for LLMs against distribution shifts.
This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - LinEasyBO: Scalable Bayesian Optimization Approach for Analog Circuit
Synthesis via One-Dimensional Subspaces [11.64233949999656]
We propose a fast and robust Bayesian optimization approach via one-dimensional subspaces for analog circuit synthesis.
Our proposed algorithm can accelerate the optimization procedure by up to 9x and 38x compared to LP-EI and REMBOpBO respectively when the batch size is 15.
arXiv Detail & Related papers (2021-09-01T21:25:25Z) - Additive Tree-Structured Conditional Parameter Spaces in Bayesian
Optimization: A Novel Covariance Function and a Fast Implementation [34.89735938765757]
We generalize the additive assumption to tree-structured functions, showing improved sample-efficiency, wider applicability and greater flexibility.
By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
We demonstrate our method on an optimization benchmark function, on pruning pre-trained VGG16 and Res50 models as well as on searching activation functions of ResNet20.
arXiv Detail & Related papers (2020-10-06T16:08:58Z) - Additive Tree-Structured Covariance Function for Conditional Parameter
Spaces in Bayesian Optimization [34.89735938765757]
We generalize the additive assumption to tree-structured functions.
By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
arXiv Detail & Related papers (2020-06-21T11:21:55Z) - On the Encoder-Decoder Incompatibility in Variational Text Modeling and
Beyond [82.18770740564642]
Variational autoencoders (VAEs) combine latent variables with amortized variational inference.
We observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold.
We propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure.
arXiv Detail & Related papers (2020-04-20T10:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.