Fugu-MT 論文翻訳(概要): Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

論文の概要: Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

arxiv url: http://arxiv.org/abs/2603.30038v1
Date: Tue, 31 Mar 2026 17:50:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.962262
Title: Benchmarking PhD-Level Coding in 3D Geometric Computer Vision
Title（参考訳）: 3次元幾何学コンピュータビジョンにおけるPhDレベル符号化のベンチマーク
Authors: Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao,
Abstract要約: 3Dビジョンのためのコーディングを評価するベンチマークであるGeoCodeBenchを紹介する。現在のエコシステムを反映する8つの代表的なオープンソースモデルとクローズドソースモデルを評価した。ベストモデルであるGPT-5は36.6%のパスレートしか達成できなかった。
参考スコア（独自算出の注目度）: 43.42170296855238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our community would change substantially. To measure progress toward that goal, we introduce GeoCodeBench, a PhD-level benchmark that evaluates coding for 3D vision. Each problem is a fill-in-the-function implementation task curated from representative papers at recent venues: we first let a tool propose candidate functions from official repositories, then perform careful human screening to select core 3D geometric components. For every target, we generate diverse, edge-case unit tests, enabling fully automatic, reproducible scoring. We evaluate eight representative open- and closed-source models to reflect the current ecosystem. The best model, GPT-5, attains only 36.6% pass rate, revealing a large gap between current capabilities and dependable 3D scientific coding. GeoCodeBench organizes tasks into a two-level hierarchy: General 3D capability (geometric transformations and mechanics/optics formulation) and Research capability (novel algorithm implementation and geometric logic routing). Scores are positively correlated across these axes, but research-oriented tasks are markedly harder. Context ablations further show that "more paper text" is not always better: cutting off at the Method section statistically outperforms full-paper inputs, highlighting unresolved challenges in long-context scientific comprehension. Together, these findings position GeoCodeBench as a rigorous testbed for advancing from generic coding to trustworthy 3D geometric vision coding.
Abstract（参考訳）: AI支援コーディングは、ソフトウェアプラクティスと研究ワークフローを急速に作り変えてきたが、今日のモデルは、複雑な3D幾何学的ビジョンのための正しいコードを生成するのに苦戦している。もしモデルが確実にそのようなコードを書くことができれば、コミュニティの研究は大きく変わるでしょう。この目標に向けての進捗を測定するために、3Dビジョンのためのコーディングを評価するPhDレベルのベンチマークであるGeoCodeBenchを紹介する。それぞれの問題は、最近の会場で代表論文から算出された補足型実装タスクであり、まず、ツールに公式リポジトリから候補関数を提案し、次に、慎重に人間の検定を行い、コア3Dの幾何学的要素を選択する。各ターゲットに対して、多種多様なエッジケース単体テストを生成し、完全に自動で再現可能なスコアリングを可能にします。現在のエコシステムを反映する8つの代表的なオープンソースモデルとクローズドソースモデルを評価した。最も優れたモデルであるGPT-5は36.6%のパスレートしか達成できず、現在の能力と信頼性の高い3Dサイエンスコーディングの間に大きなギャップがあることが判明した。 GeoCodeBenchは、タスクを2段階の階層にまとめる: 一般的な3D機能(幾何学変換とメカニクス/光学の定式化)とリサーチ機能(ノーベルアルゴリズムの実装と幾何学論理ルーティング)。スコアはこれらの軸間で正に相関するが、研究指向のタスクは著しく難しい。方法論のセクションでは、統計的に全文の入力を上回り、長文の科学的理解における未解決の課題を強調している。これらとともに、GeoCodeBenchは、汎用的なコーディングから信頼できる3次元幾何学的ヴィジュアルコーディングへと進むための厳格なテストベッドとして位置づけられた。

論文の概要: Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

関連論文リスト