Fugu-MT 論文翻訳(概要): GraphCodeBERT: Pre-training Code Representations with Data Flow

論文の概要: GraphCodeBERT: Pre-training Code Representations with Data Flow

arxiv url: http://arxiv.org/abs/2009.08366v4
Date: Mon, 13 Sep 2021 05:48:51 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-17 11:57:08.034020
Title: GraphCodeBERT: Pre-training Code Representations with Data Flow
Title（参考訳）: GraphCodeBERT: データフローによる事前トレーニングコード表現
Authors: Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou
Abstract要約: 本稿では,コード固有の構造を考慮したプログラミング言語の事前学習モデルであるGraphCodeBERTを提案する。これは変数間の"where-the-value-comes-from"の関係をエンコードするコードのセマンティックレベルの構造です。コード検索,クローン検出,コード翻訳,コード改良の4つのタスクにおいて,本モデルを評価する。
参考スコア（独自算出の注目度）: 97.00641522327699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code snippet as a sequence of tokens, while ignoring the inherent structure of code, which provides crucial code semantics and would enhance the code understanding process. We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. Such a semantic-level structure is neat and does not bring an unnecessarily deep hierarchy of AST, the property of which makes the model more efficient. We develop GraphCodeBERT based on Transformer. In addition to using the task of masked language modeling, we introduce two structure-aware pre-training tasks. One is to predict code structure edges, and the other is to align representations between source code and code structure. We implement the model in an efficient way with a graph-guided masked attention function to incorporate the code structure. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement. Results show that code structure and newly introduced pre-training tasks can improve GraphCodeBERT and achieves state-of-the-art performance on the four downstream tasks. We further show that the model prefers structure-level attentions over token-level attentions in the task of code search.
Abstract（参考訳）: 事前訓練されたプログラミング言語モデルは、コード検索、コード補完、コード要約など、様々なコード関連のタスクに対して劇的な改善を達成している。しかし、既存の事前訓練されたモデルは、コードスニペットをトークンのシーケンスとみなし、コード固有の構造を無視し、重要なコードセマンティクスを提供し、コード理解プロセスを強化する。コード固有の構造を考慮に入れたプログラミング言語の事前学習モデルであるGraphCodeBERTを提案する。抽象構文木(AST)のようなコードの構文レベルの構造を取る代わりに、変数間の"where-the-value-comes-from"の関係をエンコードするコードのセマンティックレベルの構造である、事前トレーニング段階のデータフローを使用します。このようなセマンティックレベルの構造は適切で、ASTの不要な深い階層を持ち込まない。我々は Transformer に基づいた GraphCodeBERT を開発した。マスキング言語モデリングのタスクに加えて,2つの構造対応事前学習タスクを導入する。 1つはコード構造エッジの予測であり、もう1つはソースコードとコード構造の間の表現の整合である。コード構造を組み込むために,グラフ誘導型マスキングアテンション関数を用いて効率的にモデルを実装する。コード検索,クローン検出,コード翻訳,コード改良の4つのタスクにおいて,本モデルを評価する。その結果、コード構造と新たに導入された事前学習タスクは、GraphCodeBERTを改善し、4つの下流タスクで最先端のパフォーマンスを達成することができることがわかった。さらに、コード検索のタスクにおいて、トークンレベルの注意よりも構造レベルの注意を好むことを示す。

関連論文リスト

Code Execution with Pre-trained Language Models [88.04688617516827]
コードインテリジェンスのトレーニング済みモデルのほとんどは実行トレースを無視しており、ソースコードと構文構造のみに依存している。我々は,大規模かつ現実的なPythonデータセットとコード実行タスクを作成するために,突然変異に基づくデータ拡張手法を開発した。次に、コード実行事前学習とカリキュラム学習を活用して意味理解を強化するトランスフォーマーモデルであるCodeExecutorを提案する。
論文参考訳（メタデータ） (2023-05-08T10:00:05Z)
Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities [34.27541293716398]
コードモデルがどのようにコード構文とセマンティクスを表現するかを調べるため、7つのコードモデルを広範囲に分析する。コード構文とセマンティクスを学習するモデルの能力を評価するための4つの探索タスクを開発した。コード構文とセマンティクスを習得する際の様々なコードモデルの長所と短所を強調した。
論文参考訳（メタデータ） (2022-12-20T06:15:17Z)
Soft-Labeled Contrastive Pre-training for Function-level Code Representation [127.71430696347174]
textbfSoft-labeled contrastive pre-training framework with two positive sample construction method。大規模コードコーパスにおけるコード間の関連性を考慮すると、ソフトラベル付きコントラスト付き事前学習は、きめ細かいソフトラベルを得ることができる。 SCodeRは、7つのデータセットで4つのコード関連タスクに対して、最先端のパフォーマンスを新たに達成する。
論文参考訳（メタデータ） (2022-10-18T05:17:37Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
プログラミング言語のためのクロスモーダル事前学習モデルUniXcoderを提案する。木の構造情報を全て保持するシーケンス構造でASTを変換する1対1のマッピング手法を提案する。我々は,UniXcoderを9つのデータセット上で5つのコード関連タスクで評価する。
論文参考訳（メタデータ） (2022-03-08T04:48:07Z)
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code [32.345301158791045]
ソースコードの事前訓練された言語モデルは、コードのコンテキストをモデル化するために提案されている。これらのモデルは、マスク付き事前トレーニングとトランスフォーマーを利用する。これらのモデルがなぜ機能するのか、どのような特徴相関を捉えることができるのかは不明だ。
論文参考訳（メタデータ） (2022-02-14T16:22:10Z)
CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
関数レベルのコードセマンティック表現を訓練するために,一様および二様のコントラスト学習を組み合わせたCodeRetrieverモデルを提案する。ノンモーダルなコントラスト学習のために、文書と関数名に基づいてポジティブなコードペアを構築するためのセマンティックガイド付き手法を設計する。バイモーダルなコントラスト学習では、コードのドキュメンテーションとインラインコメントを活用して、テキストコードペアを構築します。
論文参考訳（メタデータ） (2022-01-26T10:54:30Z)
Contrastive Learning for Source Code with Structural and Functional Properties [66.10710134948478]
本稿では,ソースコードの特徴に基づいて事前学習に焦点を当てた,新たな自己教師型モデルBOOSTを提案する。私たちは、機能的に等価なコードを生成する自動化された構造誘導型コード変換アルゴリズムを採用しています。私たちは、対照的な学習目標を通じて、機能的に等価なコードをより近く、異なるコードに近づける方法で、モデルをトレーニングします。
論文参考訳（メタデータ） (2021-10-08T02:56:43Z)
What do pre-trained code models know about code? [9.60966128833701]
事前に訓練されたコードモデルを調べるために、プローブと呼ばれる診断タスクを使用します。 BERT(英語で事前学習)、CodeBERT(ソースコードで事前学習)、CodeBERTa(自然言語で事前学習)、GraphCodeBERT(データフローでソースコードで事前学習)について検討した。
論文参考訳（メタデータ） (2021-08-25T16:20:17Z)
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model [23.947178895479464]
CLSEBERTは,構文強化符号事前学習モデルのための構築学習フレームワークである。事前学習段階では、抽象構文木(AST)に含まれるコード構文と階層について検討する。ひとつは抽象構文木内のノード間のエッジを予測することであり、もう一つはコードトークンの型を予測することである。
論文参考訳（メタデータ） (2021-08-10T10:08:21Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。