Fugu-MT 論文翻訳(概要): Evaluating perturbation robustnessof generative systems that use COBOL code inputs

論文の概要: Evaluating perturbation robustnessof generative systems that use COBOL code inputs

arxiv url: http://arxiv.org/abs/2511.18488v1
Date: Sun, 23 Nov 2025 15:16:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 18:34:24.863506
Title: Evaluating perturbation robustnessof generative systems that use COBOL code inputs
Title（参考訳）: COBOL符号入力を用いた生成系の摂動ロバスト性評価
Authors: Samuel Ackerman, Wesam Ibraheem, Orna Raz, Marcel Zalmanovici,
Abstract要約: 大きな言語モデル(LLM)をコンポーネントとして組み込んだシステムは、小さな入力のバリエーションに敏感であることが知られている。本稿では,入力としてコードを用いるシステムのロバスト性を評価するためのフレームワークを提案する。
参考スコア（独自算出の注目度）: 1.3327839779221817
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Systems incorporating large language models (LLMs) as a component are known to be sensitive (i.e., non-robust) to minor input variations that do not change the meaning of the input; such sensitivity may reduce the system's usefulness. Here, we present a framework to evaluate robustness of systems using COBOL code as input; our application is translation between COBOL and Java programming languages, but the approach extends to other tasks such as code generation or explanation. Targeting robustness of systems with COBOL as input is essential yet challenging. Many business-critical applications are written in COBOL, yet these are typically proprietary legacy applications and their code is unavailable to LLMs for training. We develop a library of COBOL paragraph and full-program perturbation methods, and create variant-expanded versions of a benchmark dataset of examples for a specific task. The robustness of the LLM-based system is evaluated by measuring changes in values of individual and aggregate metrics calculated on the system's outputs. Finally, we present a series of dynamic table and chart visualization dashboards that assist in debugging the system's outputs, and monitoring and understanding root causes of the system's sensitivity to input variation. These tools can be further used to improve the system by, for instance, indicating variations that should be handled by pre-processing steps.
Abstract（参考訳）: 大きな言語モデル(LLM)をコンポーネントとして組み込んだシステムは、入力の意味を変えない小さな入力のバリエーションに敏感であることが知られている。本稿では,COBOLコードを入力として使用するシステムのロバスト性を評価するためのフレームワークを提案する。 COBOLを入力とするシステムの堅牢性を目標とすることは不可欠だが、難しい。多くのビジネスクリティカルなアプリケーションはCOBOLで記述されているが、それらは典型的にはプロプライエタリなレガシアプリケーションであり、トレーニング用のLLMには利用できない。我々はCOBOL段落とフルプログラム摂動手法のライブラリを開発し、特定のタスクのサンプルのベンチマークデータセットの変分拡張版を作成する。 LLMに基づくシステムのロバスト性は、システムの出力に基づいて算出された個人および集約メトリクスの値の変化を測定することによって評価される。最後に、システムの出力のデバッグを支援し、システムの入力変動に対する感受性の根本原因を監視し、理解する、一連の動的テーブルとチャートの可視化ダッシュボードを示す。これらのツールは、例えば、前処理ステップで扱うべきバリエーションを示すことで、システムを改善するためにさらに使用できる。

論文の概要: Evaluating perturbation robustnessof generative systems that use COBOL code inputs

関連論文リスト