SparseCoder: Advancing Source Code Analysis with Sparse Attention and
Learned Token Pruning
- URL: http://arxiv.org/abs/2310.07109v1
- Date: Wed, 11 Oct 2023 01:11:30 GMT
- Title: SparseCoder: Advancing Source Code Analysis with Sparse Attention and
Learned Token Pruning
- Authors: Xueqi Yang, Mariusz Jakubowski, Kelly Kang, Haojie Yu and Tim Menzies
- Abstract summary: Transformer-based approaches, though achieving remarkable performance, struggle with long code sequences due to their self-attention mechanism.
This paper introduces SparseCoder, an innovative approach incorporating sparse attention and learned token pruning.
Extensive experiments carried out on a large-scale dataset for vulnerability detection demonstrate the effectiveness and efficiency of SparseCoder.
- Score: 9.770054863791808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As software projects rapidly evolve, software artifacts become more complex
and defects behind get harder to identify. The emerging Transformer-based
approaches, though achieving remarkable performance, struggle with long code
sequences due to their self-attention mechanism, which scales quadratically
with the sequence length. This paper introduces SparseCoder, an innovative
approach incorporating sparse attention and learned token pruning (LTP) method
(adapted from natural language processing) to address this limitation.
Extensive experiments carried out on a large-scale dataset for vulnerability
detection demonstrate the effectiveness and efficiency of SparseCoder, scaling
from quadratically to linearly on long code sequence analysis in comparison to
CodeBERT and RoBERTa. We further achieve 50% FLOPs reduction with a negligible
performance drop of less than 1% comparing to Transformer leveraging sparse
attention. Moverover, SparseCoder goes beyond making "black-box" decisions by
elucidating the rationale behind those decisions. Code segments that contribute
to the final decision can be highlighted with importance scores, offering an
interpretable, transparent analysis tool for the software engineering
landscape.
Related papers
- FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks [53.550782959908524]
We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks.
Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency.
arXiv Detail & Related papers (2024-03-19T19:27:23Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - SPEED: Speculative Pipelined Execution for Efficient Decoding [35.45955948053644]
We propose SPEED, which improves inference efficiency by speculatively executing multiple future tokens in parallel with the current token.
For Transformer decoders that employ parameter sharing, the memory operations for the tokens executing in parallel can be amortized.
We demonstrate the efficiency of our method in terms of latency reduction relative to model accuracy and demonstrate how speculation allows for training deeper decoders with parameter sharing with minimal runtime overhead.
arXiv Detail & Related papers (2023-10-18T16:07:01Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
Unified Vision Language Model [37.24203191658052]
Large-scale Transformer models bring significant improvements for various downstream vision language tasks with a unified architecture.
Performance improvements come with increasing model size, resulting in slow inference speed and increased cost for severing.
We propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously.
arXiv Detail & Related papers (2022-11-21T02:32:25Z) - Pruning Neural Belief Propagation Decoders [77.237958592189]
We introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning.
We achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder.
arXiv Detail & Related papers (2020-01-21T12:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.