Fugu-MT 論文翻訳(概要): IndustryCode: A Benchmark for Industry Code Generation

論文の概要: IndustryCode: A Benchmark for Industry Code Generation

arxiv url: http://arxiv.org/abs/2604.02729v1
Date: Fri, 03 Apr 2026 04:44:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.324506
Title: IndustryCode: A Benchmark for Industry Code Generation
Title（参考訳）: IndustryCode: 業界コード生成のベンチマーク
Authors: Puyu Zeng, Zhaoxi Wang, Zhixu Duan, Liang Feng, Shaobo Wang, Cunxiang Wang, Jinghang Wang, Bing Zhao, Hu Wei, Linfeng Zhang,
Abstract要約: 業界コード(IndustrialCode)は、複数の産業ドメインとプログラミング言語にまたがる最初の包括的なベンチマークである。 IndustryCodeは125の産業課題から派生した579のサブプロブレムで構成されており、厳格な問題記述とテストケースが伴っている。本評価では, サブプロブレムでは68.1%, 主問題では42.5%の総合精度が得られた。
参考スコア（独自算出の注目度）: 16.93701944012316
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code generation and comprehension by Large Language Models (LLMs) have emerged as core drivers of industrial intelligence and decision optimization, finding widespread application in fields such as finance, automation, and aerospace. Although recent advancements have demonstrated the remarkable potential of LLMs in general code generation, existing benchmarks are mainly confined to single domains and languages. Consequently, they fail to effectively evaluate the generalization capabilities required for real-world industrial applications or to reflect the coding proficiency demanded by complex industrial scenarios. To bridge this gap, we introduce IndustryCode, the first comprehensive benchmark designed to span multiple industrial domains and programming languages. IndustryCode comprises 579 sub-problems derived from 125 primary industrial challenges, accompanied by rigorous problem descriptions and test cases. It covers a wide range of fields, including finance, automation, aerospace, and remote sensing-and incorporates diverse programming languages such as MATLAB, Python, C++, and Stata. In our evaluation, the top-performing model, Claude 4.5 Opus, achieved an overall accuracy of 68.1% on sub-problems and 42.5% main problems. The benchmark dataset and automated evaluation code will be made publicly available upon acceptance.
Abstract（参考訳）: LLM(Large Language Models)によるコード生成と理解は、産業インテリジェンスと意思決定最適化のコアドライバとして現れ、金融、自動化、航空宇宙などの分野に広く応用されている。最近の進歩は、一般的なコード生成におけるLLMの顕著な可能性を示しているが、既存のベンチマークは主に単一のドメインや言語に限られている。その結果、実世界の産業応用に必要な一般化能力や、複雑な工業シナリオで要求されるコーディング能力の反映を効果的に評価することができない。このギャップを埋めるために、複数の産業ドメインとプログラミング言語にまたがるように設計された最初の包括的なベンチマークであるIndustrialCodeを紹介します。 IndustryCodeは125の産業課題から派生した579のサブプロブレムで構成されており、厳格な問題記述とテストケースが伴っている。金融、自動化、航空宇宙、リモートセンシングなど幅広い分野をカバーし、MATLAB、Python、C++、Staといった多様なプログラミング言語を組み込んでいる。我々の評価では、トップパフォーマンスモデルであるClaude 4.5 Opusが、サブプロブレムで68.1%、主要な問題で42.5%の精度で達成した。ベンチマークデータセットと自動評価コードは、受理時に公開される。

論文の概要: IndustryCode: A Benchmark for Industry Code Generation

関連論文リスト