Related papers: FrontierCS: Evolving Challenges for Evolving Intelligence

FrontierCS: Evolving Challenges for Evolving Intelligence

URL: http://arxiv.org/abs/2512.15699v1
Date: Wed, 17 Dec 2025 18:52:45 GMT
Title: FrontierCS: Evolving Challenges for Evolving Intelligence
Authors: Qiuyang Mang, Wenhao Chai, Zhifei Li, Huanzhi Mao, Shang Zhou, Alexander Du, Hanchen Li, Shu Liu, Edwin Chen, Yichuan Wang, Xieting Chu, Zerui Cheng, Yuan Xu, Tian Xia, Zirui Wang, Tianneng Shi, Jianzhu Yao, Yilong Zhao, Qizheng Zhang, Charlie Ruan, Zeyu Shen, Kaiyuan Liu, Runyuan He, Dong Xing, Zerui Li, Zirong Zeng, Yige Jiang, Lufeng Cheng, Ziyi Zhao, Youran Sun, Wesley Zheng, Meiyuwang Zhang, Ruyi Ji, Xuechang Tu, Zihan Zheng, Zexing Chen, Kangyang Zhou, Zhaozi Wang, Jingbang Chen, Aleksandra Korolova, Peter Henderson, Pramod Viswanath, Vijay Ganesh, Saining Xie, Zhuang Liu, Dawn Song, Sewon Min, Ion Stoica, Joseph E. Gonzalez, Jingbo Shang, Alvin Cheung,
Abstract summary: We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science.<n>For each problem we provide an expert reference solution and an automatic evaluator.<n>We find that frontier reasoning models still lag far behind human experts on both the algorithmic and research tracks.
Score: 174.80075821079708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs and top-tier competitive programming participants and problem setters. Unlike existing benchmarks that focus on tasks with known optimal solutions, FrontierCS targets problems where the optimal solution is unknown, but the quality of a solution can be objectively evaluated. Models solve these tasks by implementing executable programs rather than outputting a direct answer. FrontierCS includes algorithmic problems, which are often NP-hard variants of competitive programming problems with objective partial scoring, and research problems with the same property. For each problem we provide an expert reference solution and an automatic evaluator. Combining open-ended design, measurable progress, and expert curation, FrontierCS provides a benchmark at the frontier of computer-science difficulty. Empirically, we find that frontier reasoning models still lag far behind human experts on both the algorithmic and research tracks, that increasing reasoning budgets alone does not close this gap, and that models often over-optimize for generating merely workable code instead of discovering high-quality algorithms and system designs.

Related papers

Decision Making under Imperfect Recall: Algorithms and Benchmarks [77.12503122836422]
We introduce the first benchmark suite for imperfect-recall decision problems.<n>Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems.<n>We evaluate the performance of different algorithms for finding first-order optimal strategies in such problems.
arXiv Detail & Related papers (2026-02-16T23:19:01Z)
Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z)
FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming [19.576944188747166]
FormulaOne is a benchmark for graph theory, logic, and algorithms.<n>Our problems are incredibly demanding, requiring an array of reasoning steps.<n>Remarkably, state-of-the-art models like OpenAI's o3 fail entirely on FormulaOne.
arXiv Detail & Related papers (2025-07-17T17:53:55Z)
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering [5.248435832744057]
We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests.<n>ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution.<n>Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations.
arXiv Detail & Related papers (2025-06-10T17:59:56Z)
Assessing and Enhancing Graph Neural Networks for Combinatorial Optimization: Novel Approaches and Application in Maximum Independent Set Problems [0.0]
Graph Neural Networks (GNNs) show promise for researchers in solving Combinatorial optimization (CO) problems. This study investigates the effectiveness of GNNs in solving the maximum independent set (MIS) problem.
arXiv Detail & Related papers (2024-11-06T09:12:31Z)
Unsupervised Training of Diffusion Models for Feasible Solution Generation in Neural Combinatorial Optimization [7.85458999849461]
We present IC/DC, an unsupervised CO framework that directly trains a diffusion model from scratch.<n>We train our model in a self-supervised way to minimize the cost of the solution while adhering to the problem-specific constraints.<n> IC/DC achieves state-of-the-art performance relative to existing NCO methods on the Parallel Machine Scheduling Problem (PMSP) and Asymmetric Traveling Salesman Problem (ATSP)
arXiv Detail & Related papers (2024-10-15T06:53:30Z)
A Unifying Post-Processing Framework for Multi-Objective Learn-to-Defer Problems [6.046591474843391]
Learn-to-Defer is a paradigm that enables learning algorithms to work not in isolation but as a team with human experts. In this paper, we obtain the Bayes optimal solution for learn-to-defer systems under various constraints. Our algorithm shows improvements in terms of constraint violation over a set of baselines.
arXiv Detail & Related papers (2024-07-17T16:32:30Z)
Contractual Reinforcement Learning: Pulling Arms with Invisible Hands [68.77645200579181]
We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
arXiv Detail & Related papers (2024-07-01T16:53:00Z)
Decision-focused Graph Neural Networks for Combinatorial Optimization [62.34623670845006]
An emerging strategy to tackle optimization problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms. Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework. We introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support.
arXiv Detail & Related papers (2024-06-05T22:52:27Z)
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving [64.38649623473626]
Large Language Models (LLMs) have driven substantial progress in artificial intelligence. We propose a novel framework called textbfSEquential subtextbfGoal textbfOptimization (SEGO) to enhance LLMs' ability to solve mathematical problems.
arXiv Detail & Related papers (2023-10-19T17:56:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.