Fugu-MT 論文翻訳(概要): FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

論文の概要: FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

arxiv url: http://arxiv.org/abs/2605.25246v2
Date: Tue, 26 May 2026 13:18:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.092949
Title: FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
Title（参考訳）: Frontieror:大規模最適化における効率的なアルゴリズム設計のためのLLMの能力のベンチマーク
Authors: Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghuan Wang, Jinglong Zhao, Hanzhang Qin, Cathy Wu, Paul Pu Liang, Jinhua Zhao, Hai Wang,
Abstract要約: 大規模言語モデル(LLM)は、最適化モデリングとソルバコード生成にますます使われている。既存のベンチマークは、実際のスケールと複雑さよりもはるかに低い、小さな、あるいは単純化された例に限られている。現実的な大規模最適化問題に対して,LLMに基づく効率的なアルゴリズム設計を評価するための最初のベンチマークとしてFrontierORを紹介した。
参考スコア（独自算出の注目度）: 61.43300970020897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms that exploit problem structure and outperform direct formulation-and-solve baselines. Existing benchmarks are limited to small or simplified examples far below real-world scale and complexity. We introduce FrontierOR, among the first benchmarks to systematically evaluate LLM-based efficient algorithm design for realistic large-scale optimization problems. FrontierOR includes 180 tasks derived from methodologically diverse papers published in top-tier operations research venues, each with standardized instances and a hidden, expert-verified evaluation suite. We evaluate seven LLMs spanning frontier, cost-effective, and open-source models both in one-shot and test-time evolution settings. The results reveal that frontier models still struggle to move from executable formulations to efficient optimization algorithms: the strongest one-shot model outperforms Gurobi in only 31% of cases in both solution quality and computational efficiency, and even strong coding agents with test-time evolution achieve only 50% on selected hard tasks. FrontierOR establishes a practical evaluation platform for LLM-based optimization algorithm design, which enables future LLMs and agents to be systematically tested on whether they can move beyond correct formulation toward a feasible, high-quality, and efficient algorithm.
Abstract（参考訳）: 大規模言語モデル(LLM)は、最適化モデリングやソルバコード生成にますます使われていますが、実用的な操作の研究と最適化の問題は、しばしば難しい能力を必要とします。既存のベンチマークは、実際のスケールと複雑さよりもはるかに低い、小さな、あるいは単純化された例に限られている。現実的な大規模最適化問題に対して,LLMに基づく効率的なアルゴリズム設計を体系的に評価する最初のベンチマークとしてFrontierORを紹介した。 FrontierORには、トップレベルの運用研究会場で公開された方法論的に多様な論文から180のタスクが含まれており、それぞれが標準化されたインスタンスと、隠れた専門家による評価スイートを備えている。我々は,フロンティア,コスト効率,オープンソースモデルにまたがる7つのLCMを,ワンショットおよびテストタイムの進化設定で評価した。最強のワンショットモデルは、ソリューションの品質と計算効率の両方においてわずか31%でGurobiを上回り、テスト時間進化を伴う強力なコーディングエージェントでさえ、選択されたハードタスクで50%しか達成できない。 FrontierOR は LLM ベースの最適化アルゴリズム設計のための実用的な評価プラットフォームを確立しており、将来の LLM やエージェントが正しい定式化を超えて、実現可能で高品質で効率的なアルゴリズムに移行することができるかどうかを体系的にテストすることができる。

論文の概要: FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

関連論文リスト