Fugu-MT 論文翻訳(概要): SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning

論文の概要: SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning

arxiv url: http://arxiv.org/abs/2603.10446v2
Date: Thu, 12 Mar 2026 03:43:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.471066
Title: SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning
Title（参考訳）: SignSparK: スパースキーフレーム学習による効率的な多言語手話生成
Authors: Jianhe Low, Alexandre Symeonidis-Herzig, Maksym Ivashechkin, Ozge Mercanoglu Sincan, Richard Bowden,
Abstract要約: 現在の手話生産(SLP)フレームワークは、まさにトレードオフに直面している。本研究では,スペースを利用した新たなトレーニングパラダイムを提案し,人間の署名の真の基盤となる分布を捉える。これらの離散的なアンカーから高密度な動きを予測することにより、流体の調音を確実にしながら、回帰から平均への移動を緩和する。
参考スコア（独自算出の注目度）: 54.232148007248874
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generating natural and linguistically accurate sign language avatars remains a formidable challenge. Current Sign Language Production (SLP) frameworks face a stark trade-off: direct text-to-pose models suffer from regression-to-the-mean effects, while dictionary-retrieval methods produce robotic, disjointed transitions. To resolve this, we propose a novel training paradigm that leverages sparse keyframes to capture the true underlying kinematic distribution of human signing. By predicting dense motion from these discrete anchors, our approach mitigates regression-to-the-mean while ensuring fluid articulation. To realize this paradigm at scale, we first introduce FAST, an ultra-efficient sign segmentation model that automatically mines precise temporal boundaries. We then present SignSparK, a large-scale Conditional Flow Matching (CFM) framework that utilizes these extracted anchors to synthesize 3D signing sequences in SMPL-X and MANO spaces. This keyframe-driven formulation also uniquely unlocks Keyframe-to-Pose (KF2P) generation, making precise spatiotemporal editing of signing sequences possible. Furthermore, our adopted reconstruction-based CFM objective also enables high-fidelity synthesis in fewer than ten sampling steps; this allows SignSparK to scale across four distinct sign languages, establishing the largest multilingual SLP framework to date. Finally, by integrating 3D Gaussian Splatting for photorealistic rendering, we demonstrate through extensive evaluation that SignSparK establishes a new state-of-the-art across diverse SLP tasks and multilingual benchmarks.
Abstract（参考訳）: 自然および言語学的に正確な手話アバターを生成することは、依然として恐ろしい課題である。現在の手話生成(SLP)フレームワークは、直接テキストから目的へのモデルが回帰から平均効果に悩まされるのに対して、辞書検索の手法はロボット的な非結合的な遷移を生み出すという、大きなトレードオフに直面している。そこで本研究では,スパース鍵フレームを利用した新たなトレーニングパラダイムを提案する。これらの離散的なアンカーから高密度な動きを予測することにより、流体の調音を確実にしながら、回帰から平均への移動を緩和する。このパラダイムを大規模に実現するために、我々はまず、正確な時間境界を自動的にマイニングする超効率的な符号分割モデルであるFASTを導入する。次に,これらの抽出アンカーを用いてSMPL-XおよびMANO空間内の3次元署名シーケンスを合成する大規模条件付きフローマッチング(CFM)フレームワークSignSparKを提案する。このキーフレーム駆動の定式化はKeyframe-to-Pose(KF2P)生成をユニークにアンロックし、署名シーケンスの正確な時空間的編集を可能にする。さらに,提案手法を応用したCFMにより,10段階未満で高忠実度合成が可能となり,SignSparKは4つの異なる手話言語にまたがって拡張が可能となり,これまでで最大の多言語SLPフレームワークが確立された。最後に, 3D Gaussian Splatting をフォトリアリスティックレンダリングに組み込むことで, SignSparK が様々な SLP タスクと多言語ベンチマークにまたがる新しい最先端技術を確立していることを示す。

論文の概要: SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning

関連論文リスト