Related papers: Chemical transformer compression for accelerating both training and inference of molecular modeling

Chemical transformer compression for accelerating both training and inference of molecular modeling

URL: http://arxiv.org/abs/2205.07582v1
Date: Mon, 16 May 2022 11:38:31 GMT
Title: Chemical transformer compression for accelerating both training and inference of molecular modeling
Authors: Yi Yu and Karl Borjesson
Abstract summary: Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS) In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. By integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe.
Score: 6.98497133151762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.

Related papers

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning [54.584665518334035]
Hybrid architectures that combine Attention and State Space Models (SSMs) achieve state-of-the-art accuracy and runtime performance. Recent work has demonstrated that applying compression and distillation to Attention-only models yields smaller, more accurate models at a fraction of the training cost. We introduce a novel group-aware pruning strategy that preserves the structural integrity of SSM blocks and their sequence modeling capabilities.
arXiv Detail & Related papers (2025-04-15T17:26:29Z)
PearSAN: A Machine Learning Method for Inverse Design using Pearson Correlated Surrogate Annealing [66.27103948750306]
PearSAN is a machine learning-assisted optimization algorithm applicable to inverse design problems with large design spaces. It uses a Pearson correlated surrogate model to predict the figure of merit of the true design metric. It achieves a state-of-the-art maximum design efficiency of 97%, and is at least an order of magnitude faster than previous methods.
arXiv Detail & Related papers (2024-12-26T17:02:19Z)
Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models [0.0]
Transformer Layer Injection (TLI) is a novel method for efficiently upscaling large language models (LLMs) Our approach improves upon the conventional Depth Up-Scaling (DUS) technique by injecting new layers into every set of K layers.
arXiv Detail & Related papers (2024-10-15T14:41:44Z)
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment. It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z)
TerDiT: Ternary Diffusion Models with Transformers [83.94829676057692]
TerDiT is a quantization-aware training scheme for ternary diffusion models with transformers. We focus on the ternarization of DiT networks and scale model sizes from 600M to 4.2B.
arXiv Detail & Related papers (2024-05-23T17:57:24Z)
Co-training and Co-distillation for Quality Improvement and Compression of Language Models [88.94539115180919]
Knowledge Distillation (KD) compresses expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models. Most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. We propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models.
arXiv Detail & Related papers (2023-11-06T03:29:00Z)
TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference [6.0093441900032465]
We propose a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair.
arXiv Detail & Related papers (2023-03-27T02:45:18Z)
Online Model Compression for Federated Learning with Large Models [8.48327410170884]
Online Model Compression (OMC) is a framework that stores model parameters in a compressed format and decompresses them only when needed. OMC can reduce memory usage and communication cost of model parameters by up to 59% while attaining comparable accuracy and training speed when compared with full-precision training.
arXiv Detail & Related papers (2022-05-06T22:43:03Z)
Toward Development of Machine Learned Techniques for Production of Compact Kinetic Models [0.0]
Chemical kinetic models are an essential component in the development and optimisation of combustion devices. We present a novel automated compute intensification methodology to produce overly-reduced and optimised chemical kinetic models.
arXiv Detail & Related papers (2022-02-16T12:31:24Z)
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems [99.13795374152997]
We propose a neural network designed to distill an ensemble of large transformers into a single smaller model. An MHS model consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads. Unlike traditional distillation techniques, our approach leverages individual models in ensemble as teachers in a way that preserves the diversity of the ensemble members.
arXiv Detail & Related papers (2022-01-15T06:21:01Z)
Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models. We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation [91.05073136215886]
"Actor-Learner Distillation" transfers learning progress from a large capacity learner model to a small capacity actor model. We demonstrate in several challenging memory environments that using Actor-Learner Distillation recovers the clear sample-efficiency gains of the transformer learner model.
arXiv Detail & Related papers (2021-04-04T17:56:34Z)
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [41.04066537294312]
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. These models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications. One potential remedy for this is model compression, which has attracted a lot of research attention.
arXiv Detail & Related papers (2020-02-27T09:20:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.