Fugu-MT 論文翻訳(概要): SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition

論文の概要: SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition

arxiv url: http://arxiv.org/abs/2509.10710v1
Date: Fri, 12 Sep 2025 22:04:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.746642
Title: SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition
Title（参考訳）: SegSLR: 独立した手話認識のためのプロンプト可能なビデオセグメンテーション
Authors: Sven Schreiber, Noha Sarhan, Simone Frintrop, Christian Wilms,
Abstract要約: 孤立手話認識(I SLR)アプローチは主にRGBデータやシグナーのポーズ情報に依存する。本稿では,RGBを組み合わせ,ゼロショット映像セグメント化による情報提供を行うI SLRシステムSeg SLRを提案する。複雑なChaLearn249 IsoGDデータセットに対する評価は、Seg SLRが最先端の手法より優れていることを示している。
参考スコア（独自算出の注目度）: 3.861523667432406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Isolated Sign Language Recognition (ISLR) approaches primarily rely on RGB data or signer pose information. However, combining these modalities often results in the loss of crucial details, such as hand shape and orientation, due to imprecise representations like bounding boxes. Therefore, we propose the ISLR system SegSLR, which combines RGB and pose information through promptable zero-shot video segmentation. Given the rough localization of the hands and the signer's body from pose information, we segment the respective parts through the video to maintain all relevant shape information. Subsequently, the segmentations focus the processing of the RGB data on the most relevant body parts for ISLR. This effectively combines RGB and pose information. Our evaluation on the complex ChaLearn249 IsoGD dataset shows that SegSLR outperforms state-of-the-art methods. Furthermore, ablation studies indicate that SegSLR strongly benefits from focusing on the signer's body and hands, justifying our design choices.
Abstract（参考訳）: 孤立手話認識(ISLR)アプローチは主にRGBデータやシグナーのポーズ情報に依存する。しかし、これらのモダリティを組み合わせることで、境界ボックスのような不正確な表現のために手の形や方向などの重要な詳細が失われることがしばしばある。そこで本研究では,RGBと情報を組み合わせたISLRシステムSegSLRを提案する。ポーズ情報から手とシグナーの身体の粗い位置を推定すると、各部位をビデオを通して分割し、関連する形状情報を全て保持する。その後、このセグメンテーションは、ISLRの最も関連性の高いボディ部品にRGBデータの処理に焦点を当てる。これにより、RGBと情報を効果的に組み合わせることができる。複雑なChaLearn249 IsoGDデータセットに対する評価は、SegSLRが最先端の手法より優れていることを示している。さらにアブレーション研究は、SegSLRが署名者の身体と手に焦点を当てることで、設計上の選択を正当化できることを示唆している。

論文の概要: SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition

関連論文リスト