Fugu-MT 論文翻訳(概要): SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

論文の概要: SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

arxiv url: http://arxiv.org/abs/2606.24361v1
Date: Tue, 23 Jun 2026 09:51:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.892421
Title: SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks
Title（参考訳）: SignNet-1M:ダウンストリームベンチマークによる大規模多言語手話ビデオデータセット
Authors: Zhewen He, Junyi Hu, Haomian Huang, Zhenhua Li, Yu-Shen Liu, Yi Fang,
Abstract要約: SignNet-1Mは、ASL、CSL、ドイツ手話(DGS)にまたがる大規模な拡張データセットである。ノベルビューレンダリング(回転とズーム)、シーン/アイデンティティの編集、レンダリング後の拡張という3つの軸に沿って現実的なバリエーションを合成する。 SignNet-1Mによるトレーニングは、クロスビュー、クロスバックグラウンド、クロスアイデンティティ、ポストレンダリングシフトの下での一般化を一貫して改善することを示す。
参考スコア（独自算出の注目度）: 41.59280653082095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sign language models are typically trained on datasets captured under constrained conditions, with limited viewpoint, background, and signer-identity diversity, leading to poor robustness under real-world distribution shifts. We introduce SignNet-1M, a large-scale augmented dataset spanning ASL, CSL, and German Sign Language (DGS). SignNet-1M synthesizes realistic variations along three axes: (i) novel-view rendering (rotation and zoom) via 3D Gaussian Splatting (3DGS), (ii) scene/identity editing via diffusion models for background replacement and signer substitution while preserving sign motion and linguistic content, and (iii) post-rendering augmentations that emulate capture and compression artifacts (e.g., pose/temporal perturbations and video-level corruptions) to better match in-the-wild recordings. Beyond data release, we provide a unified benchmark suite across downstream tasks (e.g., translation and recognition) and ablations that isolate each augmentation component. Experiments across backbones show that training with SignNet-1M consistently improves generalization under cross-view, cross-background, cross-identity, and post-rendering shifts, while maintaining strong in-distribution performance. The dataset, full augmentation pipeline, and benchmark are available at https://signnet.chatsign.ai/.
Abstract（参考訳）: 署名言語モデルは一般的に、制限された条件下でキャプチャされたデータセットに基づいてトレーニングされ、限られた視点、背景、およびシグナアイデンティティの多様性を持つ。本稿では, ASL, CSL, German Sign Language (DGS) にまたがる大規模拡張データセットSignNet-1Mを紹介する。 SignNet-1Mは3軸に沿った現実的な変動を合成する。 (i)3Dガウススティング(3DGS)によるノベルビューレンダリング(回転・ズーム) 二背景交換及びシグナ置換のための拡散モデルによるシーン・アイデンティティの編集及び手話の動作及び言語内容の保存三撮影・圧縮アーティファクト(例えば、ポーズ/時間的摂動及びビデオレベルの汚職)をエミュレートした後処理の強化。データリリース以外にも、下流タスク(例えば、翻訳と認識)と各拡張コンポーネントを分離するアブレーションに統一されたベンチマークスイートを提供しています。バックボーン間の実験では、SignNet-1Mによるトレーニングは、強い分配性能を維持しながら、クロスビュー、クロスバック、クロスアイデンティティ、ポストレンダリングシフトの下での一般化を一貫して改善している。データセット、完全な拡張パイプライン、ベンチマークはhttps://signnet.chatsign.ai/.com/で公開されている。

論文の概要: SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

関連論文リスト