Fugu-MT 論文翻訳(概要): Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

論文の概要: Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

arxiv url: http://arxiv.org/abs/2603.09896v1
Date: Tue, 10 Mar 2026 16:50:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.47516
Title: Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
Title（参考訳）: スポーツにおける空間知能のベンチマーク
Authors: Yuchen Yang, Yuqing Shao, Duxiu Huang, Linfeng Dong, Yifei Liu, Suixin Tang, Xiang Zhou, Yuanyuan Gao, Wei Wang, Yue Zhou, Xue Yang, Yanfeng Wang, Xiao Sun, Zhihang Zhong,
Abstract要約: スポーツシナリオに合わせた,初の大規模空間情報データセットであるCourtSIを提示する。 CourtSIには100万以上のQAペアが含まれており、空間的計数、距離測定、局所化、関係推論を網羅する全体分類の下で構成されている。また,厳密な検証を伴う3,686のQAペアからなる高品質評価ベンチマークであるCourtSI-Benchを紹介する。
参考スコア（独自算出の注目度）: 46.83689976902389
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sports have long attracted broad attention as they push the limits of human physical and cognitive capabilities. Amid growing interest in spatial intelligence for vision-language models (VLMs), sports provide a natural testbed for understanding high-intensity human motion and dynamic object interactions. To this end, we present CourtSI, the first large-scale spatial intelligence dataset tailored to sports scenarios. CourtSI contains over 1M QA pairs, organized under a holistic taxonomy that systematically covers spatial counting, distance measurement, localization, and relational reasoning, across representative net sports including badminton, tennis, and table tennis. Leveraging well-defined court geometry as metric anchors, we develop a semi-automatic data engine to reconstruct sports scenes, enabling scalable curation of CourtSI. In addition, we introduce CourtSI-Bench, a high-quality evaluation benchmark comprising 3,686 QA pairs with rigorous human verification. We evaluate 25 proprietary and open-source VLMs on CourtSI-Bench, revealing a remaining human-AI performance gap and limited generalization from existing spatial intelligence benchmarks. These findings indicate that sports scenarios expose limitations in spatial intelligence capabilities captured by existing benchmarks. Further, fine-tuning Qwen3-VL-8B on CourtSI improves accuracy on CourtSI-Bench by 23.5 percentage points. The adapted model also generalizes effectively to CourtSI-Ext, an evaluation set built on a similar but unseen sport, and demonstrates enhanced spatial-aware commentary generation. Together, these findings demonstrate that CourtSI provides a scalable pathway toward advancing spatial intelligence of VLMs in sports.
Abstract（参考訳）: スポーツは、人間の身体的および認知能力の限界を押し上げるにつれて、長い間広く注目を集めてきた。視覚言語モデル(VLM)の空間知能への関心が高まっている中で、スポーツは高強度の人間の動きと動的物体の相互作用を理解するための自然なテストベッドを提供する。この目的のために,スポーツシナリオに適した最初の大規模空間情報データセットであるCourtSIを提案する。 CourtSIには100万以上のQAペアが含まれており、空間カウント、距離測定、ローカライゼーション、リレーショナル推論を体系的に網羅し、バドミントン、テニス、卓球などの代表的ネットスポーツを網羅している。スポーツシーンを再構築する半自動データエンジンを開発し,CourtSIのスケーラブルなキュレーションを実現する。さらに,厳密な検証を伴う3,686のQAペアからなる高品質評価ベンチマークであるCourtSI-Benchを紹介する。我々はCourtSI-Bench上で25のプロプライエタリかつオープンソースなVLMを評価し、既存の空間知能ベンチマークによる人間とAIのパフォーマンスギャップと限定的な一般化を明らかにした。これらの結果から,スポーツシナリオは,既存のベンチマークで捉えた空間知能の限界を明らかにすることが示唆された。さらに、CourtSI上の微調整Qwen3-VL-8BはCourtSI-Benchの精度を23.5ポイント向上する。適応モデルはまた、類似のスポーツ上に構築された評価セットであるCourtSI-Extを効果的に一般化し、空間認識のコメント生成の強化を示す。これらの結果から,CourtSIはスポーツにおけるVLMの空間的インテリジェンスを促進するためのスケーラブルな経路を提供することが示された。

論文の概要: Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

関連論文リスト