Related papers: On the Compression of Language Models for Code: An Empirical Study on CodeBERT

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

URL: http://arxiv.org/abs/2412.13737v1
Date: Wed, 18 Dec 2024 11:14:30 GMT
Title: On the Compression of Language Models for Code: An Empirical Study on CodeBERT
Authors: Giordano d'Aloisio, Luca Traini, Federica Sarro, Antinisca Di Marco,
Abstract summary: We investigate the impact of three well-known compression strategies -- knowledge distillation, quantization, and pruning -- across three different classes of software engineering tasks.<n>Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed.
Score: 9.574645433491225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various compression strategies to improve the efficiency of language models for code. These strategies aim to optimize inference latency and memory usage, though often at the cost of reduced model effectiveness. However, there is still a significant gap in understanding how these strategies influence the efficiency and effectiveness of language models for code. Here, we empirically investigate the impact of three well-known compression strategies -- knowledge distillation, quantization, and pruning -- across three different classes of software engineering tasks: vulnerability detection, code summarization, and code search. Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed. Practitioners and researchers can use these insights to make informed decisions when selecting the most appropriate compression strategy, balancing both efficiency and effectiveness based on their specific needs.

Related papers

Efficient Strategy for Improving Large Language Model (LLM) Capabilities [0.0]
Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing.<n>Their large-scale deployment remains constrained by the need for significant computational resources.<n>This work proposes starting from a base model to explore and combine data processing and careful data selection techniques.
arXiv Detail & Related papers (2025-08-06T04:08:26Z)
Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code [11.16693333878553]
Transformer-based language models for code have shown remarkable performance in various software analytics tasks.<n>Their adoption is hindered by high computational costs, slow inference speeds, and substantial environmental impact.<n>Model compression techniques such as pruning, quantization, and knowledge distillation have gained traction in addressing these challenges.
arXiv Detail & Related papers (2025-08-05T22:32:32Z)
On the Scaling of Robustness and Effectiveness in Dense Retrieval [111.58315434849047]
Robustness and effectiveness are critical aspects of developing dense retrieval models for real-world applications.<n>Recent work has addressed scaling laws of effectiveness in dense retrieval, revealing a power-law relationship between effectiveness and the size of models and data.<n>We find that robustness and effectiveness exhibit different scaling patterns, leading to significant resource costs when jointly improving both.
arXiv Detail & Related papers (2025-05-30T06:57:27Z)
Efficient Reasoning Models: A Survey [52.96232442322824]
This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions: (1) shorter - compressing lengthy CoTs into concise yet effective reasoning chains; (2) smaller - developing compact language models with strong reasoning capabilities; and (3) faster.
arXiv Detail & Related papers (2025-04-15T06:28:00Z)
Investigating Execution-Aware Language Models for Code Optimization [7.62248558265865]
This study investigates how incorporating code execution information into language models affects their ability to optimize code. Our results indicate that execution-aware models provide limited benefits compared to the standard CodeT5+ model in optimizing code.
arXiv Detail & Related papers (2025-03-11T09:46:07Z)
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning [11.167198972934736]
Large language models (LLMs) such as GPT-4 have led to a surge in the size of prompts required for optimal performance.<n>We propose a novel and efficient reinforcement learning (RL) based task-aware prompt compression method.<n>We demonstrate that our RL-guided compression method improves the task performance by 8% - 189% over state-of-the-art compression techniques.
arXiv Detail & Related papers (2024-09-19T18:11:59Z)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically. We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z)
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work. We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z)
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models [2.2871867623460216]
This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa. Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy.
arXiv Detail & Related papers (2024-04-06T23:52:53Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models [7.542276054279341]
transformer language models achieve outstanding results in many natural language processing (NLP) tasks. Their enormous size often makes them impractical on memory-constrained devices, requiring practitioners to compress them to smaller networks. In this paper, we explore offline compression methods, meaning computationally-cheap approaches that do not require further fine-tuning of the compressed model.
arXiv Detail & Related papers (2023-02-08T13:36:06Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.