Fugu-MT 論文翻訳(概要): Technical Report: Full-Stack Fine-Tuning for the Q Programming Language

論文の概要: Technical Report: Full-Stack Fine-Tuning for the Q Programming Language

arxiv url: http://arxiv.org/abs/2508.06813v1
Date: Sat, 09 Aug 2025 04:22:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.560993
Title: Technical Report: Full-Stack Fine-Tuning for the Q Programming Language
Title（参考訳）: テクニカルレポート:Qプログラミング言語のためのフルスタックファインチューニング
Authors: Brendan R. Hogan, Will Brown, Adel Boyarsky, Anderson Schneider, Yuriy Nevmyvaka,
Abstract要約: 我々はQ言語の評価データセットをリリースする。データセット上で主要なフロンティアモデルをベンチマークし、事前トレーニング、教師付き微調整、強化学習を行います。我々の最高のモデルは、Qベンチマークで59%のパス@1精度を実現し、最高のパフォーマンスのフロンティアモデルを上回っています。
参考スコア（独自算出の注目度）: 1.2316583133621197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Even though large language models are becoming increasingly capable, it is still unreasonable to expect them to excel at tasks that are under-represented on the Internet. Leveraging LLMs for specialized applications, particularly in niche programming languages and private domains, remains challenging and largely unsolved. In this work, we address this gap by presenting a comprehensive, open-source approach for adapting LLMs to the Q programming language, a popular tool in quantitative finance that is much less present on the Internet compared to Python, C, Java, and other ``mainstream" languages and is therefore not a strong suit of general-purpose AI models. We introduce a new Leetcode style evaluation dataset for Q, benchmark major frontier models on the dataset, then do pretraining, supervised fine tuning, and reinforcement learning to train a suite of reasoning and non-reasoning models based on the Qwen-2.5 series, spanning five parameter sizes (1.5B, 3B, 7B, 14B, 32B). Our best model achieves a pass@1 accuracy of 59 percent on our Q benchmark, surpassing the best-performing frontier model, Claude Opus-4 by 29.5 percent. Additionally, all models, even our 1.5B model, outperform GPT-4.1 on this task. In addition to releasing models, code, and data, we provide a detailed blueprint for dataset construction, model pretraining, supervised fine-tuning, and reinforcement learning. Our methodology is broadly applicable, and we discuss how these techniques can be extended to other tasks, including those where evaluation may rely on soft or subjective signals.
Abstract（参考訳）: 大規模言語モデルはますます有能になってきていますが、インターネット上であまり表現されていないタスクに優れていると期待するのは理にかなっていることではありません。特殊なアプリケーション、特にニッチプログラミング言語やプライベートドメインでLLMを活用することは、依然として困難であり、ほとんど解決されていない。本稿では,LLMをQ言語に適用するための包括的かつオープンソースなアプローチを提案することで,このギャップに対処する。これは,PythonやC,Java,その他“メインストリーム”言語に比べてインターネット上には存在せず,汎用AIモデルに強く適していない定量的ファイナンスにおいて,一般的なツールである。そこで我々は,Qwen-2.5シリーズに基づく推論モデルと非推論モデルのスイートをトレーニングするために,Qのための新しいLeetcodeスタイル評価データセット,データセット上の主要フロンティアモデルベンチマーク,事前トレーニング,教師付き微調整,強化学習を導入し,パラメータサイズを5つ(1.5B,3B,7B,14B,32B)に分けた。我々の最高のモデルは、Qベンチマークで59%のパス@1精度を実現し、最高のパフォーマンスのフロンティアモデルであるClaude Opus-4を29.5%上回る。さらに、1.5Bモデルを含む全てのモデルがこのタスクでGPT-4.1を上回った。モデル、コード、データのリリースに加えて、データセットの構築、モデルの事前トレーニング、教師付き微調整、強化学習のための詳細な青写真を提供する。提案手法は広く適用可能であり,評価がソフト信号や主観的信号に依存する場合など,これらの手法を他のタスクに拡張する方法について論じる。

論文の概要: Technical Report: Full-Stack Fine-Tuning for the Q Programming Language

関連論文リスト