Fugu-MT 論文翻訳(概要): Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds

論文の概要: Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds

arxiv url: http://arxiv.org/abs/2602.07864v1
Date: Sun, 08 Feb 2026 08:29:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-10 20:26:24.838251
Title: Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds
Title（参考訳）: 構造における思考:制約されたマニフォールドの推論による空間知性の評価
Authors: Chen Yang, Guanxin Lin, Youquan He, Peiyao Chen, Guanghe Liu, Yufan Mo, Zhouyuan Xu, Linhao Wang, Guohui Zhang, Zihang Zhang, Shenxiang Zeng, Chen Wang, Jiansheng Fan,
Abstract要約: SSI-Benchは制約された3次元構造の空間的推論のためのベンチマークである。 10人の研究者が400時間以上かけて画像をキュレーションし、構造部品を注釈付けし、ピクセルレベルの手がかりを最小限にするために質問をデザインしました。最高のオープンソースモデルは22.2%の精度で最強のクローズドソースモデルは33.6%、人間は91.6%である。
参考スコア（独自算出の注目度）: 6.062002698657217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial intelligence is crucial for vision--language models (VLMs) in the physical world, yet many benchmarks evaluate largely unconstrained scenes where models can exploit 2D shortcuts. We introduce SSI-Bench, a VQA benchmark for spatial reasoning on constrained manifolds, built from complex real-world 3D structures whose feasible configurations are tightly governed by geometric, topological, and physical constraints. SSI-Bench contains 1,000 ranking questions spanning geometric and topological reasoning and requiring a diverse repertoire of compositional spatial operations, such as mental rotation, cross-sectional inference, occlusion reasoning, and force-path reasoning. It is created via a fully human-centered pipeline: ten researchers spent over 400 hours curating images, annotating structural components, and designing questions to minimize pixel-level cues. Evaluating 31 widely used VLMs reveals a large gap to humans: the best open-source model achieves 22.2% accuracy and the strongest closed-source model reaches 33.6%, while humans score 91.6%. Encouraging models to think yields only marginal gains, and error analysis points to failures in structural grounding and constraint-consistent 3D reasoning. Project page: https://ssi-bench.github.io.
Abstract（参考訳）: 空間知能は物理世界の視覚言語モデル(VLM)にとって重要であるが、モデルが2Dショートカットを活用できるような制約のないシーンを、多くのベンチマークで評価している。 SSI-Benchは制約付き多様体上の空間的推論のためのVQAベンチマークであり、幾何学的、位相的、物理的制約によって厳密に制御される複雑な実世界の3D構造から構築される。 SSI-Benchには、幾何学的および位相的推論にまたがる1000のランク付け質問が含まれており、メンタルローテーション、断面積推論、オクルージョン推論、フォースパス推論などの構成空間操作の多様なレパートリーを必要とする。 10人の研究者が画像のキュレーションに400時間以上を費やし、構造コンポーネントを注釈付けし、ピクセルレベルのキューを最小限にするために質問を設計しました。最高のオープンソースモデルは22.2%の精度で最強のクローズドソースモデルは33.6%、人間は91.6%である。モデルに思考を促進させると限界ゲインしか得られず、エラー解析は構造的接地や制約に一貫性のある3D推論の失敗を示している。プロジェクトページ: https://ssi-bench.github.io

論文の概要: Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds

関連論文リスト