Weakly-supervised Fingerspelling Recognition in British Sign Language
Videos
- URL: http://arxiv.org/abs/2211.08954v1
- Date: Wed, 16 Nov 2022 15:02:36 GMT
- Title: Weakly-supervised Fingerspelling Recognition in British Sign Language
Videos
- Authors: K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, G\"ul Varol,
Andrew Zisserman
- Abstract summary: Previous fingerspelling recognition methods have not focused on British Sign Language (BSL)
In contrast to previous methods, our method only uses weak annotations from subtitles for training.
We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities.
- Score: 85.61513254261523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this work is to detect and recognize sequences of letters signed
using fingerspelling in British Sign Language (BSL). Previous fingerspelling
recognition methods have not focused on BSL, which has a very different signing
alphabet (e.g., two-handed instead of one-handed) to American Sign Language
(ASL). They also use manual annotations for training. In contrast to previous
methods, our method only uses weak annotations from subtitles for training. We
localize potential instances of fingerspelling using a simple feature
similarity method, then automatically annotate these instances by querying
subtitle words and searching for corresponding mouthing cues from the signer.
We propose a Transformer architecture adapted to this task, with a
multiple-hypothesis CTC loss function to learn from alternative annotation
possibilities. We employ a multi-stage training approach, where we make use of
an initial version of our trained model to extend and enhance our training data
before re-training again to achieve better performance. Through extensive
evaluations, we verify our method for automatic annotation and our model
architecture. Moreover, we provide a human expert annotated test set of 5K
video clips for evaluating BSL fingerspelling recognition methods to support
sign language research.
Related papers
- Toward American Sign Language Processing in the Real World: Data, Tasks,
and Methods [15.77894358993113]
I study automatic sign language processing in the wild, using signing videos collected from the Internet.
I present three new large-scale ASL datasets in the wild: ChicagoFSWild, ChicagoFSWild+, and OpenASL.
I propose two tasks for building real-world fingerspelling-based applications: fingerspelling detection and search.
arXiv Detail & Related papers (2023-08-23T20:38:19Z) - Automatic dense annotation of large-vocabulary sign language videos [85.61513254261523]
We propose a simple, scalable framework to vastly increase the density of automatic annotations.
We make these annotations publicly available to support the sign language research community.
arXiv Detail & Related papers (2022-08-04T17:55:09Z) - Fingerspelling Detection in American Sign Language [32.79935314131377]
We consider the task of fingerspelling detection in raw, untrimmed sign language videos.
This is an important step towards building real-world fingerspelling recognition systems.
We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.
arXiv Detail & Related papers (2021-04-03T02:11:09Z) - Read and Attend: Temporal Localisation in Sign Language Videos [84.30262812057994]
We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens.
We show that it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.
arXiv Detail & Related papers (2021-03-30T16:39:53Z) - Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
Given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.
We train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles which provide additional weak-supervision; and (3) looking up words in visual sign language dictionaries.
These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning.
arXiv Detail & Related papers (2020-10-08T14:12:56Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.