Fugu-MT 論文翻訳(概要): Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity

論文の概要: Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity

arxiv url: http://arxiv.org/abs/2508.21634v1
Date: Fri, 29 Aug 2025 13:51:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-01 19:45:11.06315
Title: Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
Title（参考訳）: 人書き対AI生成コード:欠陥、脆弱性、複雑さの大規模研究
Authors: Domenico Cotroneo, Cristina Improta, Pietro Liguori,
Abstract要約: 本稿では,人間の開発者と最先端のLLMであるChatGPT,DeepSeek-Coder,Qwen-Coderの3つのコードを比較した。我々の評価は、PythonとJavaの2つの広く使われている言語で500万以上のコードサンプルにまたがっており、Orthogonal Defect ClassificationとCommon Weaknessionを使ったセキュリティ脆弱性によって欠陥を分類している。 AI生成コードは一般的にシンプルで、未使用のコンストラクトやハードコードになりがちであるのに対して、人間書きのコードはより構造的な複雑さを示し、保守性の問題の集中度が高い。
参考スコア（独自算出の注目度）: 4.478789600295493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI code assistants become increasingly integrated into software development workflows, understanding how their code compares to human-written programs is critical for ensuring reliability, maintainability, and security. In this paper, we present a large-scale comparison of code authored by human developers and three state-of-the-art LLMs, i.e., ChatGPT, DeepSeek-Coder, and Qwen-Coder, on multiple dimensions of software quality: code defects, security vulnerabilities, and structural complexity. Our evaluation spans over 500k code samples in two widely used languages, Python and Java, classifying defects via Orthogonal Defect Classification and security vulnerabilities using the Common Weakness Enumeration. We find that AI-generated code is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues. Notably, AI-generated code also contains more high-risk security vulnerabilities. These findings highlight the distinct defect profiles of AI- and human-authored code and underscore the need for specialized quality assurance practices in AI-assisted programming.
Abstract（参考訳）: AIコードアシスタントがますますソフトウェア開発ワークフローに統合されるにつれて、信頼性、保守性、セキュリティを確保する上で、コードと人間が書いたプログラムがどのように比較されるかを理解することが重要です。本稿では,人間の開発者と最先端の3つのLLM(ChatGPT,DeepSeek-Coder,Qwen-Coder)による,ソフトウェア品質の多次元的側面(コード欠陥,セキュリティ脆弱性,構造複雑性)の大規模比較を行う。我々の評価は、PythonとJavaの2つの広く使われている言語で500万以上のコードサンプルにまたがっており、Orthogonal Defect ClassificationとCommon Weakness Enumerationを使って欠陥を分類している。 AI生成コードは一般的にシンプルで反復的だが、未使用のコンストラクトやハードコードデバッグの傾向が強く、人手によるコードはより構造的な複雑さを示し、保守性の問題の集中度が高い。特に、AI生成コードには、リスクの高いセキュリティ脆弱性も含まれている。これらの発見は、AIと人間によるコード特有の欠陥プロファイルを強調し、AI支援プログラミングにおける特別な品質保証プラクティスの必要性を強調している。

関連論文リスト

Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
本稿では,SBCスコアと呼ばれる新しいスコアリング機構を提案する。これは、大規模言語モデルの自然言語生成能力を活用するリバースジェネレーション技術に基づいている。直接コード解析とは異なり、我々のアプローチはAI生成コードからシステム要求を再構築し、元の仕様と比較する。
論文参考訳（メタデータ） (2025-02-11T01:12:11Z)
SOK: Exploring Hallucinations and Security Risks in AI-Assisted Software Development with Insights for LLM Deployment [0.0]
GitHub Copilot、ChatGPT、Cursor AI、Codeium AIといった大規模言語モデル(LLM)は、コーディングの世界に革命をもたらした。本稿では,AIを利用したコーディングツールのメリットとリスクを包括的に分析する。
論文参考訳（メタデータ） (2025-01-31T06:00:27Z)
Comparing Human and LLM Generated Code: The Jury is Still Out! [8.456554883523472]
大規模言語モデル(LLM)と人間プログラマによるPythonのソフトウェアコード作成の有効性を比較した。 Pylint、Radon、Bandit、テストケースなど、さまざまな静的分析ベンチマークを使用しています。我々は、人間とGPT-4の両方が生成したコードのセキュリティ欠陥を観察するが、GPT-4コードはより深刻な外れ値を含んでいた。
論文参考訳（メタデータ） (2025-01-28T11:11:36Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
大規模言語モデル(LLM)はコード生成において顕著な進歩を遂げた。 CodeIPは、新しいマルチビット透かし技術で、出所の詳細を保持するために追加情報を挿入する。 5つのプログラミング言語にまたがる実世界のデータセットで実施された実験は、CodeIPの有効性を実証している。
論文参考訳（メタデータ） (2024-04-24T04:25:04Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
本稿では,自然言語入力をコード入力に変換するフレームワークであるCodeAttackを紹介する。我々の研究は、コード入力に対するこれらのモデルの新たな、普遍的な安全性の脆弱性を明らかにした。 CodeAttackと自然言語の分布ギャップが大きくなると、安全性の一般化が弱くなる。
論文参考訳（メタデータ） (2024-03-12T17:55:38Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
我々は,大規模な言語モデルであるCodeGen2を利用した多目的コード脆弱性解析システム texttSecRepair を導入する。そこで本研究では,LLMを用いた脆弱性解析に適した命令ベースデータセットを提案する。 GitHub上の6つのオープンソースIoTオペレーティングシステムにおいて、ゼロデイとNデイの脆弱性を特定します。
論文参考訳（メタデータ） (2024-01-07T02:46:39Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
本研究では,不確実性に関する情報を伝達することで,プログラマがより迅速かつ正確にコードを生成することができるかどうかを検討する。トークンのハイライトは、編集される可能性が最も高いので、タスクの完了が早くなり、よりターゲットを絞った編集が可能になることがわかりました。
論文参考訳（メタデータ） (2023-02-14T18:43:34Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。