Fugu-MT 論文翻訳(概要): Learning Skill-Attributes for Transferable Assessment in Video

論文の概要: Learning Skill-Attributes for Transferable Assessment in Video

arxiv url: http://arxiv.org/abs/2511.13993v1
Date: Mon, 17 Nov 2025 23:53:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:52.838431
Title: Learning Skill-Attributes for Transferable Assessment in Video
Title（参考訳）: 動画における伝達性評価のための学習スキル属性
Authors: Kumar Ashutosh, Kristen Grauman,
Abstract要約: ビデオによるスキル評価は、人の身体的パフォーマンスの品質を評価し、何がより良くできるかを説明する。当社のCrossTrainerアプローチでは,バランス,コントロール,手の位置決めといったスキル属性が検出される。人間のスキルを表わす行動の共有を抽象化することにより,提案した映像表現は,既存のテクニックの配列よりもはるかに優れている。
参考スコア（独自算出の注目度）: 56.813876909367856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Skill assessment from video entails rating the quality of a person's physical performance and explaining what could be done better. Today's models specialize for an individual sport, and suffer from the high cost and scarcity of expert-level supervision across the long tail of sports. Towards closing that gap, we explore transferable video representations for skill assessment. Our CrossTrainer approach discovers skill-attributes, such as balance, control, and hand positioning -- whose meaning transcends the boundaries of any given sport, then trains a multimodal language model to generate actionable feedback for a novel video, e.g., "lift hands more to generate more power" as well as its proficiency level, e.g., early expert. We validate the new model on multiple datasets for both cross-sport (transfer) and intra-sport (in-domain) settings, where it achieves gains up to 60% relative to the state of the art. By abstracting out the shared behaviors indicative of human skill, the proposed video representation generalizes substantially better than an array of existing techniques, enriching today's multimodal large language models.
Abstract（参考訳）: ビデオによるスキル評価は、人の身体的パフォーマンスの品質を評価し、何がより良くできるかを説明する。今日のモデルは個々のスポーツを専門とし、スポーツの長い尾にまたがる専門家レベルの監督のコストと不足に悩まされている。そのギャップを埋めるために、我々は、スキルアセスメントのための転送可能なビデオ表現について検討する。私たちのCrossTrainerアプローチは、バランス、コントロール、手の位置決めといったスキル属性を見つけます -- 特定のスポーツの境界を越えて、新しいビデオのアクション可能なフィードバックを生成するためにマルチモーダル言語モデルをトレーニングします。我々は,クロススポーツ(トランスファー)とドメイン内(ドメイン内)の両方で,複数のデータセット上で新しいモデルを検証する。人間のスキルを表現した共有動作を抽象化することにより、提案したビデオ表現は、既存のテクニックの配列よりもはるかに優れた一般化を実現し、今日のマルチモーダルな大規模言語モデルを強化している。

論文の概要: Learning Skill-Attributes for Transferable Assessment in Video

関連論文リスト