Fugu-MT 論文翻訳(概要): An Empirical Study on the Effects of System Prompts in Instruction-Tuned Models for Code Generation

論文の概要: An Empirical Study on the Effects of System Prompts in Instruction-Tuned Models for Code Generation

arxiv url: http://arxiv.org/abs/2602.15228v1
Date: Mon, 16 Feb 2026 22:11:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-18 16:03:17.920602
Title: An Empirical Study on the Effects of System Prompts in Instruction-Tuned Models for Code Generation
Title（参考訳）: コード生成学習モデルにおけるシステムプロンプトの効果に関する実証的研究
Authors: Zaiyu Cheng, Antonio Mastropaolo,
Abstract要約: システムプロンプトがコードアシスタントに与える影響を系統的に評価する。システムプロンプト制約の特異性の増加は単調に正しさを向上しないことがわかった。より大規模なコード特化モデルでは、ゼロショット生成に対するパフォーマンスの低下がほとんどない。
参考スコア（独自算出の注目度）: 4.76360912129794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-tuned Language Models (ILMs) have become essential components of modern AI systems, demonstrating exceptional versatility across natural language and reasoning tasks. Among their most impactful applications is code generation, where ILMs -- commonly referred to as Code Language Models (CLMs) -- translate human intent into executable programs. While progress has been driven by advances in scaling and training methodologies, one critical aspect remains underexplored: the impact of system prompts on both general-purpose ILMs and specialized CLMs for code generation. We systematically evaluate how system prompts of varying instructional detail, along with model scale, prompting strategy, and programming language, affect code assistant. Our experimental setting spans 360 configurations across four models, five system prompts, three prompting strategies, two languages, and two temperature settings. We find that (1) increasing system-prompt constraint specificity does not monotonically improve correctness -- prompt effectiveness is configuration-dependent and can help or hinder based on alignment with task requirements and decoding context; (2) for larger code-specialized models, few-shot examples can degrade performance relative to zero-shot generation, contrary to conventional wisdom; and (3) programming language matters, with Java exhibiting significantly greater sensitivity to system prompt variations than Python, suggesting language-specific prompt engineering strategies may be necessary.
Abstract（参考訳）: インストラクションチューニング言語モデル(ILM)は、現代のAIシステムにおいて不可欠なコンポーネントとなり、自然言語や推論タスクにまたがる例外的な万能性を実証している。最も影響力のあるアプリケーションはコード生成であり、ICM(一般にコード言語モデル(CLM)と呼ばれる)は人間の意図を実行可能なプログラムに変換する。進歩はスケーリングとトレーニングの手法の進歩によって推進されているが、システムプロンプトがコード生成のための汎用ILMと特殊なCLMの両方に与える影響は、まだ解明されていない。モデルスケール,プロンプト戦略,プログラム言語などとともに,プログラムがコードアシスタントにどう影響するかを体系的に評価する。 5つのシステムプロンプト、3つのプロンプト戦略、2つの言語、2つの温度設定。 1)システムプロンプト制約の特異性の向上は,一元的に正しさを向上するものではない -- 迅速な有効性は,タスク要件やデコードコンテキストの整合性に基づいて,構成に依存し,あるいは障害となる可能性があること,2) より大規模なコード特化モデルでは,従来の知恵とは対照的に,ゼロショット生成に対するパフォーマンスを低下させることができること,3) プログラム言語の問題,そして,Java が Python よりもシステムプロンプトの変動に対してはるかに敏感であること,そして言語特化の迅速なエンジニアリング戦略が必要であることを示唆する。

論文の概要: An Empirical Study on the Effects of System Prompts in Instruction-Tuned Models for Code Generation

関連論文リスト