Fugu-MT 論文翻訳(概要): FrontierCS: Evolving Challenges for Evolving Intelligence

論文の概要: FrontierCS: Evolving Challenges for Evolving Intelligence

arxiv url: http://arxiv.org/abs/2512.15699v1
Date: Wed, 17 Dec 2025 18:52:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-18 17:06:27.113345
Title: FrontierCS: Evolving Challenges for Evolving Intelligence
Title（参考訳）: FrontierCS: インテリジェンスを進化させるための挑戦
Authors: Qiuyang Mang, Wenhao Chai, Zhifei Li, Huanzhi Mao, Shang Zhou, Alexander Du, Hanchen Li, Shu Liu, Edwin Chen, Yichuan Wang, Xieting Chu, Zerui Cheng, Yuan Xu, Tian Xia, Zirui Wang, Tianneng Shi, Jianzhu Yao, Yilong Zhao, Qizheng Zhang, Charlie Ruan, Zeyu Shen, Kaiyuan Liu, Runyuan He, Dong Xing, Zerui Li, Zirong Zeng, Yige Jiang, Lufeng Cheng, Ziyi Zhao, Youran Sun, Wesley Zheng, Meiyuwang Zhang, Ruyi Ji, Xuechang Tu, Zihan Zheng, Zexing Chen, Kangyang Zhou, Zhaozi Wang, Jingbang Chen, Aleksandra Korolova, Peter Henderson, Pramod Viswanath, Vijay Ganesh, Saining Xie, Zhuang Liu, Dawn Song, Sewon Min, Ion Stoica, Joseph E. Gonzalez, Jingbo Shang, Alvin Cheung,
Abstract要約: コンピュータ科学の様々な領域にまたがる156のオープンエンド問題のベンチマークであるFrontierCSを紹介する。各問題に対して、専門家の参照ソリューションと自動評価器を提供する。私たちは、アルゴリズムと研究のトラックに関して、フロンティア推論モデルが人間の専門家よりずっと遅れていることに気付きました。
参考スコア（独自算出の注目度）: 174.80075821079708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs and top-tier competitive programming participants and problem setters. Unlike existing benchmarks that focus on tasks with known optimal solutions, FrontierCS targets problems where the optimal solution is unknown, but the quality of a solution can be objectively evaluated. Models solve these tasks by implementing executable programs rather than outputting a direct answer. FrontierCS includes algorithmic problems, which are often NP-hard variants of competitive programming problems with objective partial scoring, and research problems with the same property. For each problem we provide an expert reference solution and an automatic evaluator. Combining open-ended design, measurable progress, and expert curation, FrontierCS provides a benchmark at the frontier of computer-science difficulty. Empirically, we find that frontier reasoning models still lag far behind human experts on both the algorithmic and research tracks, that increasing reasoning budgets alone does not close this gap, and that models often over-optimize for generating merely workable code instead of discovering high-quality algorithms and system designs.
Abstract（参考訳）: 我々は、コンピュータ科学の様々な分野にまたがる156のオープンエンド問題のベンチマークであるFrontierCSを紹介し、CSのPhDや最上位の競合プログラミング参加者、問題セッターを含む専門家によって設計・レビューされた。既知の最適解を持つタスクに焦点を当てた既存のベンチマークとは異なり、FrontierCSは最適解が未知な問題をターゲットにしているが、ソリューションの品質を客観的に評価することができる。モデルは直接答えを出すのではなく、実行可能プログラムを実装することでこれらのタスクを解決する。 FrontierCS にはアルゴリズムの問題が含まれており、これはしばしば、客観的な部分的なスコアリングを持つ競合プログラミング問題のNPハードな変種であり、同じ性質を持つ研究問題である。各問題に対して、専門家の参照ソリューションと自動評価器を提供する。オープンな設計、測定可能な進歩、専門家によるキュレーションを組み合わせることで、FrontierCSはコンピュータ科学の難しさの最前線にベンチマークを提供する。実証的に見れば、フロンティア推論モデルは、アルゴリズムと研究の両方のトラックの専門家よりもはるかに遅れており、推論予算の増加は、このギャップを埋めるものではなく、高品質なアルゴリズムやシステム設計を発見するのではなく、単に実行可能なコードを生成するために過度に最適化されていることが分かる。

論文の概要: FrontierCS: Evolving Challenges for Evolving Intelligence

関連論文リスト