Fugu-MT 論文翻訳(概要): Unveiling the potential of large language models in generating semantic and cross-language clones

論文の概要: Unveiling the potential of large language models in generating semantic and cross-language clones

arxiv url: http://arxiv.org/abs/2309.06424v1
Date: Tue, 12 Sep 2023 17:40:49 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-13 11:52:22.397100
Title: Unveiling the potential of large language models in generating semantic and cross-language clones
Title（参考訳）: 意味的・言語横断的クローン生成における大規模言語モデルの可能性
Authors: Palash R. Roy, Ajmain I. Alam, Farouq Al-omari, Banani Roy, Chanchal K. Roy, Kevin A. Schneider
Abstract要約: OpenAIのGPTモデルは、テキスト生成に使用されるGPTのようなクローン生成の可能性を秘めている。セマンティッククローンの分野では、GPT-3の精度は62.14%と0.55 BLEUで、数発のプロンプトエンジニアリングによって達成されている。
参考スコア（独自算出の注目度）: 8.791710193028905
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance.In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. Our findings shed light on GPT-3's strengths in code generation, offering insights into the potential applications and challenges of using advanced language models in software development. Our quantitative analysis yields compelling results. In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering. Furthermore, the model shines in transcending linguistic confines, boasting an exceptional 91.25% accuracy in generating cross-language clones
Abstract（参考訳）: セマンティックおよびクロス言語コードクローン生成は、コードの再利用、コードの理解、リファクタリング、ベンチマークに有用である。 OpenAIのGPTモデルは、テキスト生成に使用されるGPTのようなクローン生成の可能性を秘めている。開発者がStack Overflow(SO)あるいはシステム内でコードをコピー/ペーストする場合、予期しない動作につながる一貫性のない変更が発生する可能性がある。 Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance.In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. 我々の発見は、コード生成におけるGPT-3の強みに光を当て、ソフトウェア開発で高度な言語モデルを使用することの潜在的な応用と課題に関する洞察を与えました。我々の定量分析は説得力のある結果をもたらす。セマンティッククローンの分野では、GPT-3の精度は62.14%と0.55 BLEUで、数発のプロンプトエンジニアリングによって達成されている。さらに、このモデルは超越する言語圏において輝き、言語間クローンの生成において例外的な91.25%の精度を誇っている。

関連論文リスト

Development and Benchmarking of Multilingual Code Clone Detector [2.253851493296371]
多言語コードクローン検出器は、ターゲット言語のみの構文情報を提供することで、新しい言語のサポートを追加しやすくする。 ANTLR生成に基づく多言語コードブロック抽出法を提案し、多言語コードクローン検出器(MSCCD)を実装した。最先端の10の検出器と比較して、MSCCDは平均レベルで動作し、さらに多くの言語をサポートしている。
論文参考訳（メタデータ） (2024-09-10T03:08:33Z)
Large Language Models for cross-language code clone detection [3.5202378300682162]
言語間のコードクローン検出は、ソフトウェアエンジニアリングコミュニティで注目を集めている。機械学習の大幅な進歩にインスパイアされた本論文では、言語間コードクローン検出を再考する。
論文参考訳（メタデータ） (2024-08-08T12:57:14Z)
Assessing the Code Clone Detection Capability of Large Language Models [0.0]
評価には、さまざまなクローンタイプのコードペアと類似度のレベルでモデルをテストすることが含まれる。 GPT-4はすべてのクローンタイプでGPT-3.5を一貫して上回っている。
論文参考訳（メタデータ） (2024-07-02T16:20:44Z)
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
大規模言語モデルは、自然言語の理解と生成において例外的な能力を示した。しかし、それらの生成速度は、その復号過程の本質的にシーケンシャルな性質によって制限される。本稿では,データ駆動方式で実装された新しいデコーディング手法であるLexical Unit Decodingを紹介する。
論文参考訳（メタデータ） (2024-05-24T04:35:13Z)
AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection [69.79627042058048]
AdaCCDは、その言語でアノテーションを使わずに、新しい言語のクローンコードを検出する新しい言語間適応手法である。 5つのプログラミング言語からなる多言語コードクローン検出ベンチマークを構築し,AdaCCDの言語間適応性を評価する。
論文参考訳（メタデータ） (2023-11-13T12:20:48Z)
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
本稿では,オープンソースの事前学習型LLMであるCodeFuse-13Bを紹介する。英語と中国語の両方のプロンプトによるコード関連のタスク用に特別に設計されている。 CodeFuseは、高品質な事前トレーニングデータセットを利用することで、その効果を達成する。
論文参考訳（メタデータ） (2023-10-10T02:38:44Z)
GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench [1.8687918300580921]
本稿では,SemanticCloneBenchとOpenAIのGPT-3モデルを利用して,包括的セマンティッククローンと言語間クローンベンチマークGPTCloneBenchを提案する。 GPT-3出力の79,928個のクローンペアから、37,149個の真のセマンティッククローンペア、19,288個の偽セマンティックペア(Type-1/Type-2)、および4言語(Java、C、C#、Python)にわたる20,770個のクロス言語クローンのベンチマークを作成しました。
論文参考訳（メタデータ） (2023-08-26T21:50:34Z)
ZC3: Zero-Shot Cross-Language Code Clone Detection [79.53514630357876]
ゼロショットクロスランゲージコードクローン検出のためのZC3という新しい手法を提案する。 ZC3は、異なるプログラミング言語間で同型表現空間を形成するために、対照的なスニペット予測を設計する。これに基づいて、ZC3はドメイン認識学習とサイクル一貫性学習を利用して、異なる言語間で整合した表現を生成する。
論文参考訳（メタデータ） (2023-08-26T03:48:10Z)
Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
本稿では,MBXPとMultilingual HumanEval,MathQA-Xという,評価コード生成モデルに関する新しいベンチマークを提案する。これらのデータセットは10以上のプログラミング言語をカバーする。コード生成モデルの性能を多言語で評価することができる。
論文参考訳（メタデータ） (2022-10-26T17:17:06Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
7つのテスト言語に対する並列データを持たないゼロショット問題として,言語間セマンティックパーシングについて検討した。英文論理形式ペアデータのみを用いて解析知識を付加言語に転送するマルチタスクエンコーダデコーダモデルを提案する。このシステムは、ゼロショット解析を潜時空間アライメント問題としてフレーム化し、事前訓練されたモデルを改善し、最小のクロスリンガル転送ペナルティで論理形式を生成することができる。
論文参考訳（メタデータ） (2021-04-15T16:08:43Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。