Fugu-MT 論文翻訳(概要): Code Digital Twin: Empowering LLMs with Tacit Knowledge for Complex Software Maintenance

論文の概要: Code Digital Twin: Empowering LLMs with Tacit Knowledge for Complex Software Maintenance

arxiv url: http://arxiv.org/abs/2503.07967v1
Date: Tue, 11 Mar 2025 01:46:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-03-12 22:35:51.506693
Title: Code Digital Twin: Empowering LLMs with Tacit Knowledge for Complex Software Maintenance
Title（参考訳）: Code Digital Twin: 複雑なソフトウェアメンテナンスのための暗黙の知識によるLLMの強化
Authors: Xin Peng, Chong Wang, Mingwei Liu, Yiling Lou, Yijian Wu,
Abstract要約: 我々は,暗黙的知識の概念表現である textbfCode Digital Twin の概念とフレームワークを紹介する。コードデジタルツインは、構造化ソースと非構造化ソースの両方からの知識抽出を組み合わせた方法論を用いて構築される。
参考スコア（独自算出の注目度）: 9.603528792596348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large language models (LLMs) have demonstrated promise in software engineering tasks like code completion and generation, their support for the maintenance of complex software systems remains limited. These models often struggle with understanding the tacit knowledge embedded in systems, such as responsibility allocation and collaboration across different modules. To address this gap, we introduce the concept and framework of \textbf{Code Digital Twin}, a conceptual representation of tacit knowledge that captures the concepts, functionalities, and design rationales behind code elements, co-evolving with the software. A code digital twin is constructed using a methodology that combines knowledge extraction from both structured and unstructured sources--such as source code, documentation, and change histories--leveraging LLMs, static analysis tools, and human expertise. This framework can empower LLMs for software maintenance tasks such as issue localization and repository-level code generation by providing tacit knowledge as contexts. Based on the proposed methodology, we explore the key challenges and opportunities involved in the continuous construction and refinement of code digital twin.
Abstract（参考訳）: 大規模言語モデル(LLM)は、コード補完や生成といったソフトウェア工学のタスクにおいて有望であることを示しているが、複雑なソフトウェアシステムのメンテナンスに対するサポートは依然として限られている。これらのモデルは、責任割り当てや異なるモジュール間の協調など、システムに埋め込まれた暗黙の知識を理解するのに苦労することが多い。このギャップに対処するために、私たちは、コード要素の背後にある概念、機能、設計の合理性を捉える暗黙の知識の概念表現である、‘textbf{Code Digital Twin}’の概念とフレームワークを紹介します。ソースコード、ドキュメンテーション、変更履歴など、構造化ソースと非構造化ソースの両方から知識を抽出する方法論を使って、コードデジタルツインを構築します。このフレームワークは、コンテキストとして暗黙の知識を提供することで、問題ローカライゼーションやリポジトリレベルのコード生成などのソフトウェアメンテナンスタスクにLLMを活用できる。提案手法に基づいて,デジタルツインの継続的構築と改良に関わる重要な課題と機会について検討する。

関連論文リスト

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding [50.17907898478795]
本研究では,現実のリバースエンジニアリングシナリオにおけるLarge Language Models(LLM)の有効性を評価するためのベンチマークを提案する。評価の結果、既存のLLMはバイナリコードをある程度理解でき、それによってバイナリコード解析の効率が向上することが明らかとなった。
論文参考訳（メタデータ） (2025-04-30T17:02:06Z)
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
大規模言語モデル(LLM)では、コードと推論が互いに強化される。コードは検証可能な実行パスを提供し、論理的な分解を強制し、実行時の検証を可能にする。我々は,このシナジーを強化するために,重要な課題を特定し,今後の研究方向性を提案する。
論文参考訳（メタデータ） (2025-02-26T18:55:42Z)
Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
大規模言語モデル(LLM)は、様々な領域、特にシステム1タスクにおいて顕著な機能を示した。 System2-to-System1法に関する最近の研究が急増し、推論時間計算によるシステム2の推論知識が探索された。本稿では,システム2タスクの代表的タスクであるコード生成に注目し,主な課題を2つ挙げる。
論文参考訳（メタデータ） (2025-02-18T03:20:50Z)
Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
我々は、構造化出力、プロセスの監督、テストタイム計算など、これまでの分野の進歩について論じる。モジュール型かつ信頼性の高いLCMシステムの開発に向けた研究の今後の方向性について概説する。
論文参考訳（メタデータ） (2024-11-25T07:48:31Z)
Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights [9.414198519543564]
codellm-devkit (以下, CLDK') は,プログラム解析のプロセスを大幅に単純化したオープンソースライブラリである。 CLDKは開発者に対して直感的でユーザフレンドリなインターフェースを提供しています。
論文参考訳（メタデータ） (2024-10-16T20:05:59Z)
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs [9.649864680130781]
提案するCodeMMLUは,CodeLLMにおけるソフトウェア深度とコード理解度を評価するためのベンチマークである。 CodeMMLUには、コード分析、欠陥検出、ソフトウェアエンジニアリング原則といったタスクを含む、さまざまなドメインからソースされた10,000以上の質問が含まれている。評価の結果,最先端モデルでさえ,CodeMMLUでは重大な課題に直面していることが明らかとなった。
論文参考訳（メタデータ） (2024-10-02T20:04:02Z)
Code-Survey: An LLM-Driven Methodology for Analyzing Large-Scale Codebases [3.8153349016958074]
我々は,大規模規模の探索と解析を目的とした最初のLCM駆動型手法であるCode-Surveyを紹介した。調査を慎重に設計することで、Code-Surveyはコミット、Eメールなどの構造化されていないデータを、構造化、構造化、分析可能なデータセットに変換する。これにより、複雑なソフトウェアの進化を定量的に分析し、設計、実装、保守、信頼性、セキュリティに関する貴重な洞察を明らかにすることができる。
論文参考訳（メタデータ） (2024-09-24T17:08:29Z)
How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
バイナリコード理解におけるLarge Language Models(LLM)の有効性を評価するためのベンチマークを提案する。評価の結果、既存のLLMはバイナリコードをある程度理解でき、それによってバイナリコード解析の効率が向上することが明らかとなった。
論文参考訳（メタデータ） (2024-04-15T14:44:08Z)
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
我々は、最先端のCode LLMとコードインテリジェンスのためのオープンソースのTransformerベースのライブラリであるCodeTFを紹介する。我々のライブラリは、事前訓練されたコードLLMモデルと人気のあるコードベンチマークのコレクションをサポートします。 CodeTFが機械学習/生成AIとソフトウェア工学のギャップを埋められることを願っている。
論文参考訳（メタデータ） (2023-05-31T05:24:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。