Fugu-MT 論文翻訳(概要): HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

論文の概要: HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

arxiv url: http://arxiv.org/abs/2603.15617v1
Date: Mon, 16 Mar 2026 17:59:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.728572
Title: HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification
Title（参考訳）: HorizonMath: 自動検証による数学的発見に向けたAIの進歩の測定
Authors: Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, Alessandro Abate,
Abstract要約: 計算および応用数学において8つの領域にまたがる100以上の未解決問題のベンチマークであるHorizonMathを紹介する。我々のベンチマークは、発見が困難であり、意味のある数学的洞察を必要とする問題のクラスをターゲットにしているが、検証は計算的に効率的で簡単なものである。
参考スコア（独自算出の注目度）: 54.06301039725887
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel research is still widely debated and underexplored. We introduce HorizonMath, a benchmark of over 100 predominantly unsolved problems spanning 8 domains in computational and applied mathematics, paired with an open-source evaluation framework for automated verification. Our benchmark targets a class of problems where discovery is hard, requiring meaningful mathematical insight, but verification is computationally efficient and simple. Because these solutions are unknown, HorizonMath is immune to data contamination, and most state-of-the-art models score near 0%. Existing research-level benchmarks instead rely on formal proof verification or manual review, both of which are expensive to scale. Using this platform, we find two problems for which GPT 5.4 Pro proposes solutions that improve on the best-known published results, representing potential novel contributions (pending expert review). We release HorizonMath as an open challenge and a growing community resource, where correct solutions to problems in the unsolved problem classes could constitute novel results in the mathematical literature.
Abstract（参考訳）: AIは重要な、未解決の数学的問題を前進させることができるか? 大規模な言語モデルは、現在、洗練された数学的および科学的推論が可能であるが、それらが新しい研究を行うことができるかどうかはまだ広く議論されており、未調査である。計算と応用数学の8つの領域にまたがる100以上の未解決問題のベンチマークであるHorizonMathを、自動検証のためのオープンソースの評価フレームワークと組み合わせて紹介する。我々のベンチマークは、発見が困難であり、意味のある数学的洞察を必要とする問題のクラスをターゲットにしているが、検証は計算的に効率的で簡単なものである。これらの解は未知であるため、HorizonMathはデータの汚染に免疫を持ち、ほとんどの最先端モデルは0%近くである。既存の研究レベルのベンチマークは、正式な証明検証や手作業によるレビューに頼っている。このプラットフォームを用いて、GPT 5.4 Proが最もよく知られた結果を改善するソリューションを提案し、新たなコントリビューションの可能性を示唆する2つの問題を見つける(専門家レビューを控える)。我々はHorizonMathをオープンチャレンジとコミュニティリソースとしてリリースし、未解決問題クラスの問題に対する正しい解決策が数学的文献における新しい結果を構成することができる。

論文の概要: HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

関連論文リスト