Fugu-MT 論文翻訳(概要): SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization

論文の概要: SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization

arxiv url: http://arxiv.org/abs/2401.14727v1
Date: Fri, 26 Jan 2024 09:23:27 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-29 15:24:28.163500
Title: SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization
Title（参考訳）: sparsecoder:ファイルレベルのコード要約のための識別子認識スパーストランスフォーマ
Authors: Yanlin Wang, Yanxian Huang, Daya Guo, Hongyu Zhang and Zibin Zheng
Abstract要約: 本稿では,大規模なソースコードプロジェクトの理解と維持を支援するファイルレベルのコード要約について検討する。長いコードシーケンスを効果的に処理するための識別子対応スパース変換器であるSparseCoderを提案する。
参考スコア（独自算出の注目度）: 51.67317895094664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code summarization aims to generate natural language descriptions of source code, facilitating programmers to understand and maintain it rapidly. While previous code summarization efforts have predominantly focused on method-level, this paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. Unlike method-level code summarization,file-level code summarization typically involves long source code within a single file, which makes it challenging for Transformer-based models to understand the code semantics for the maximum input length of these models is difficult to set to a large number that can handle long code input well, due to the quadratic scaling of computational complexity with the input sequence length. To address this challenge, we propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences. Specifically, the SparseCoder employs a sliding window mechanism for self-attention to model short-term dependencies and leverages the structure message of code to capture long-term dependencies among source code identifiers by introducing two types of sparse attention patterns named global and identifier attention. To evaluate the performance of SparseCoder, we construct a new dataset FILE-CS for file-level code summarization in Python. Experimental results show that our SparseCoder model achieves state-of-the-art performance compared with other pre-trained models, including full self-attention and sparse models. Additionally, our model has low memory overhead and achieves comparable performance with models using full self-attention mechanism.
Abstract（参考訳）: コード要約(code summarization)は、ソースコードの自然言語記述を生成することを目的としている。従来のコード要約の取り組みは,主にメソッドレベルに重点を置いていたが,本稿では,大規模なソースコードプロジェクトの理解と保守を支援するファイルレベルのコード要約について検討する。メソッドレベルのコード要約とは異なり、ファイルレベルのコード要約は通常、1つのファイル内の長いソースコードを含むため、Transformerベースのモデルでは、入力シーケンス長と計算複雑性の2次スケーリングのため、これらのモデルの最大入力長に対するコードの意味を理解することは困難である。この課題に対処するために、長いコードシーケンスを効果的に処理するための識別子対応スパーストランスであるSparseCoderを提案する。具体的には、sparsecoderは、短期的な依存関係をモデル化するためのセルフアテンションのためのスライディングウィンドウ機構を採用しており、コードの構造メッセージを利用して、グローバルと識別子アテンションという2つのタイプのスパース注意パターンを導入することで、ソースコード識別子間の長期的な依存関係をキャプチャする。 SparseCoderの性能を評価するため,Pythonにおけるファイルレベルのコード要約のための新しいデータセットFILE-CSを構築した。実験の結果,sparsecoderモデルは他の事前学習モデルと比較して最先端の性能が得られることがわかった。さらに,本モデルではメモリオーバーヘッドが低く,完全自己認識機構を用いたモデルと同等の性能を実現する。

論文の概要: SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization

関連論文リスト