Fugu-MT 論文翻訳(概要): TennisExpert: Towards Expert-Level Analytical Sports Video Understanding

論文の概要: TennisExpert: Towards Expert-Level Analytical Sports Video Understanding

arxiv url: http://arxiv.org/abs/2603.13397v2
Date: Tue, 17 Mar 2026 16:02:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:06.836755
Title: TennisExpert: Towards Expert-Level Analytical Sports Video Understanding
Title（参考訳）: TennisExpert: エキスパートレベルのスポーツビデオ理解を目指す
Authors: Zhaoyu Liu, Xi Weng, Lianyu Hu, Zhe Hou, Kan Jiang, Jin Song Dong, Yang Liu,
Abstract要約: テニスは最も広くフォローされているスポーツの1つであり、プロのアナリティクス、自動コーチング、リアルタイムの解説に強い可能性を持つ広範な放送映像を生み出している。しかし、詳細な注釈と専門家レベルの注釈を付けた大規模なベンチマークが欠如しているため、自動テニス理解はいまだ探索されていない。これらの課題に対処するため、200以上のプロの試合(471.9時間)と4万以上のラリーレベルのクリップからなる大規模なテニスベンチマークであるTennisVLを紹介した。 Qwen3-VL-8B上に構築されたメモリ拡張モデルとビデオセマンティクスを統合するマルチモーダルテニス理解フレームワークであるTennisExpertを提案する。
参考スコア（独自算出の注目度）: 16.625250626542208
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tennis is one of the most widely followed sports, generating extensive broadcast footage with strong potential for professional analysis, automated coaching, and real-time commentary. However, automatic tennis understanding remains underexplored due to two key challenges: (1) the lack of large-scale benchmarks with fine-grained annotations and expert-level commentary, and (2) the difficulty of building accurate yet efficient multimodal systems suitable for real-time deployment. To address these challenges, we introduce TennisVL, a large-scale tennis benchmark comprising over 200 professional matches (471.9 hours) and 40,000+ rally-level clips. Unlike existing commentary datasets that focus on descriptive play-by-play narration, TennisVL emphasizes expert analytical commentary capturing tactical reasoning, player decisions, and match momentum. Furthermore, we propose TennisExpert, a multimodal tennis understanding framework that integrates a video semantic parser with a memory-augmented model built on Qwen3-VL-8B. The parser extracts key match elements (e.g., scores, shot sequences, ball bounces, and player locations), while hierarchical memory modules capture both short- and long-term temporal context. Experiments show that TennisExpert consistently outperforms strong proprietary baselines, including GPT-5, Gemini, and Claude, and demonstrates improved ability to capture tactical context and match dynamics. Our dataset and code are publicly available at https://github.com/LZYAndy/TennisExpert.
Abstract（参考訳）: テニスは最も広くフォローされているスポーツの1つであり、プロのアナリティクス、自動コーチング、リアルタイムの解説に強い可能性を持つ広範な放送映像を生み出している。しかし, テニスの自動理解は, 1) 微粒な注釈と専門家レベルの注釈付き大規模ベンチマークの欠如, (2) リアルタイム展開に適した正確かつ効率的なマルチモーダルシステムを構築することの難しさ, という2つの大きな課題により, いまだ探索されていない。これらの課題に対処するため、200以上のプロの試合(471.9時間)と4万以上のラリーレベルのクリップからなる大規模なテニスベンチマークであるTennisVLを紹介した。記述的なプレイ・バイ・プレイナレーションに焦点を当てた既存の注釈データセットとは異なり、TennisVLは戦術的推論、プレイヤーの決定、運動量とのマッチングを専門的な分析的注釈に重点を置いている。さらに,Qwen3-VL-8B上に構築されたメモリ拡張モデルとビデオセマンティックパーサを統合したマルチモーダルテニス理解フレームワークであるTennisExpertを提案する。パーサはキーマッチ要素(スコア、ショットシーケンス、ボールバウンス、プレーヤ位置)を抽出し、階層記憶モジュールは短期と長期の時間的コンテキストの両方をキャプチャする。実験によると、TennisExpertはGPT-5、Gemini、Claudeなど、強力なプロプライエタリなベースラインを一貫して上回り、戦術的コンテキストをキャプチャし、ダイナミックスにマッチする能力を向上している。私たちのデータセットとコードはhttps://github.com/LZYAndy/TennisExpert.comで公開されています。

論文の概要: TennisExpert: Towards Expert-Level Analytical Sports Video Understanding

関連論文リスト