Fugu-MT 論文翻訳(概要): BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

論文の概要: BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

arxiv url: http://arxiv.org/abs/2007.12131v2
Date: Wed, 13 Oct 2021 17:13:42 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-07 12:48:24.434727
Title: BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
Title（参考訳）: BSL-1K:口笛を用いた手話認識の高速化
Authors: Samuel Albanie and G\"ul Varol and Liliane Momeni and Triantafyllos Afouras and Joon Son Chung and Neil Fox and Andrew Zisserman
Abstract要約: ビデオデータから高品質なアノテーションを得るために,シグナリングキューの使い方を示す。 BSL-1Kデータセット(英: BSL-1K dataset)は、イギリス手話(英: British Sign Language, BSL)の集合体である。
参考スコア（独自算出の注目度）: 106.21067543021887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.
Abstract（参考訳）: 近年の細粒度なジェスチャーや動作の分類や機械翻訳の進歩は、手話の自動認識が現実になる可能性を示している。この目標に向かって進む上で重要な障害は、サインアノテーションの複雑さの高さと、資格のあるアノテーションの供給の制限から生じる、適切なトレーニングデータの欠如である。本研究では,連続ビデオにおける手話認識のための新しいスケーラブルなデータ収集手法を提案する。放送映像の字幕を弱く整列した字幕とキーワードスポッティング法を用いて,1000時間の動画で1000文字の語彙を自動的ローカライズする。 We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. 最後に,サイン認識とサインスポッティングのタスクに対する大規模評価セットを提案するとともに,この領域における研究を刺激する基盤となることを期待する。

関連論文リスト

Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies [6.403291706982091]
独立した手話認識は、スケーラブルな言語技術にとって不可欠である。言語をまたいで一般化し,語彙を進化させるワンショット学習手法を提案する。我々は、異なる言語から10,235のユニークな記号を含む大きな辞書上で50.8%のワンショットMRRを含む最先端の結果を得る。
論文参考訳（メタデータ） (2025-02-27T15:07:51Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
本稿では,多言語手話コーパスを用いた連続手話認識の実現可能性について検討する。まず、2つのデータセットに現れる独立した記号を含む2つの手話辞書を構築します。次に、適切に最適化された手話認識モデルを用いて、2つの手話間の手話間の手話マッピングを同定する。
論文参考訳（メタデータ） (2023-08-21T15:58:47Z)
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos [85.61513254261523]
従来の指スペル認識法は、British Sign Language (BSL) に焦点を絞っていない従来の手法とは対照的に,本手法はトレーニング用字幕の弱いアノテーションのみを使用する。本稿では,このタスクに適応したTransformerアーキテクチャを提案する。
論文参考訳（メタデータ） (2022-11-16T15:02:36Z)
Automatic dense annotation of large-vocabulary sign language videos [85.61513254261523]
自動アノテーションの密度を大幅に高めるための,シンプルでスケーラブルなフレームワークを提案する。これらのアノテーションは手話研究コミュニティをサポートするために公開されています。
論文参考訳（メタデータ） (2022-08-04T17:55:09Z)
Scaling up sign spotting through sign language dictionaries [99.50956498009094]
この作業の焦点は、$textitsign spotting$ - 分離されたサインのビデオの場合、$textitwwhere$ と $textitwhere$ の識別が、連続的かつ協調的な手話ビデオで署名されている。我々は,(1) $textitwatching$既存の映像を口コミでスムーズにラベル付けする,(2) $textitreading$ associated subtitles that provide additional translations of the signed content。アプローチの有効性を低く検証する。
論文参考訳（メタデータ） (2022-05-09T10:00:03Z)
Read and Attend: Temporal Localisation in Sign Language Videos [84.30262812057994]
我々は,連続署名ストリームを取り込み,一連の文書トークンを出力するトランスフォーマーモデルを訓練する。入力シーケンス内の符号インスタンスの大規模な語彙に出席する能力を得て,その局所化を可能にすることを示す。
論文参考訳（メタデータ） (2021-03-30T16:39:53Z)
Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
孤立した手話のビデオが与えられた場合、我々のタスクは、連続的かつ協調的な手話ビデオで署名されたか、どこで署名されたかを特定することである。我々は,(1)既存の粗末なラベル付き映像を見ること,(2)追加の弱スーパービジョンを提供する関連字幕を読むこと,(3)視覚手話辞書で単語を検索すること,の3つを用いて,利用可能な複数のタイプの監督手法を用いてモデルを訓練する。これらの3つのタスクは、ノイズコントラスト推定と多重インスタンス学習の原則を用いて統合学習フレームワークに統合される。
論文参考訳（メタデータ） (2020-10-08T14:12:56Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。