Fugu-MT 論文翻訳(概要): Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder

論文の概要: Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder

arxiv url: http://arxiv.org/abs/2509.20149v2
Date: Sun, 19 Oct 2025 14:48:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:38.585196
Title: Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder
Title（参考訳）: 要求からコードへのトレーサビリティの相乗的向上:大規模言語モデルに基づくデータ拡張と高度なエンコーダを組み合わせたフレームワーク
Authors: Jianzhang Zhang, Jialong Zhou, Nan Niu, Jinping Hua, Chuang Liu,
Abstract要約: 本稿では,大規模言語モデル(LLM)によるデータ拡張と高度なエンコーダを統合するフレームワークを提案し,検証する。まず、双方向およびゼロ/フェーショットプロンプト戦略の体系的評価によって最適化されたデータ拡張が、非常に効果的であることを実証した。さらに、より広範な事前学習コーパスと拡張コンテキストウィンドウによって区別されるエンコーダを組み込むことにより、最先端の事前学習言語モデルに基づく確立された手法をさらに強化する。
参考スコア（独自算出の注目度）: 5.241456612683375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated requirement-to-code traceability link recovery, essential for industrial system quality and safety, is critically hindered by the scarcity of labeled data. To address this bottleneck, this paper proposes and validates a synergistic framework that integrates large language model (LLM)-driven data augmentation with an advanced encoder. We first demonstrate that data augmentation, optimized through a systematic evaluation of bi-directional and zero/few-shot prompting strategies, is highly effective, while the choice among leading LLMs is not a significant performance factor. Building on the augmented data, we further enhance an established, state-of-the-art pre-trained language model based method by incorporating an encoder distinguished by a broader pre-training corpus and an extended context window. Our experiments on four public datasets quantify the distinct contributions of our framework's components: on its own, data augmentation consistently improves the baseline method, providing substantial performance gains of up to 26.66%; incorporating the advanced encoder provides an additional lift of 2.21% to 11.25%. This synergy culminates in a fully optimized framework with maximum gains of up to 28.59% on $F_1$ score and 28.9% on $F_2$ score over the established baseline, decisively outperforming ten established baselines from three dominant paradigms. This work contributes a pragmatic and scalable methodology to overcome the data scarcity bottleneck, paving the way for broader industrial adoption of data-driven requirement-to-code traceability.
Abstract（参考訳）: 産業システムの品質と安全性に不可欠な自動コードトレーサビリティリンクリカバリは,ラベル付きデータの不足によって著しく妨げられている。このボトルネックに対処するために,大規模言語モデル(LLM)によるデータ拡張を高度なエンコーダと統合する相乗的フレームワークを提案し,検証する。まず、双方向およびゼロ/フェーショットプロンプト戦略の体系的評価により最適化されたデータ拡張が極めて効果的であることを示す。拡張データに基づいて、より広範な事前学習コーパスと拡張コンテキストウィンドウによって区別されるエンコーダを組み込むことにより、最先端の事前学習言語モデルに基づく確立された手法をさらに強化する。 4つの公開データセットに関する我々の実験は、我々のフレームワークのコンポーネントの異なるコントリビューションを定量化しています。データ拡張は、ベースラインメソッドを一貫して改善し、パフォーマンスが26.66%まで向上し、高度なエンコーダを組み込むことで、さらに2.21%から11.25%に向上します。このシナジーは、F_1$スコアで最大28.59%、F_2$スコアで最大28.9%の最適化されたフレームワークで頂点に達し、3つの支配的パラダイムから確立された10のベースラインを決定的に上回っている。この作業は、データ不足のボトルネックを克服し、データ駆動の要求-コードのトレーサビリティを産業的に広く採用するための、実用的でスケーラブルな方法論に寄与する。

論文の概要: Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder

関連論文リスト