Scaling up sign spotting through sign language dictionaries
- URL: http://arxiv.org/abs/2205.04152v1
- Date: Mon, 9 May 2022 10:00:03 GMT
- Title: Scaling up sign spotting through sign language dictionaries
- Authors: G\"ul Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras,
Andrew Zisserman
- Abstract summary: The focus of this work is $textitsign spotting$ - given a video of an isolated sign, our task is to identify $textitwhether$ and $textitwhere$ it has been signed in a continuous, co-articulated sign language video.
We train a model using multiple types of available supervision by: (1) $textitwatching$ existing footage which is sparsely labelled using mouthing cues; (2) $textitreading$ associated subtitles which provide additional translations of the signed content.
We validate the effectiveness of our approach on low
- Score: 99.50956498009094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The focus of this work is $\textit{sign spotting}$ - given a video of an
isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$
it has been signed in a continuous, co-articulated sign language video. To
achieve this sign spotting task, we train a model using multiple types of
available supervision by: (1) $\textit{watching}$ existing footage which is
sparsely labelled using mouthing cues; (2) $\textit{reading}$ associated
subtitles (readily available translations of the signed content) which provide
additional $\textit{weak-supervision}$; (3) $\textit{looking up}$ words (for
which no co-articulated labelled examples are available) in visual sign
language dictionaries to enable novel sign spotting. These three tasks are
integrated into a unified learning framework using the principles of Noise
Contrastive Estimation and Multiple Instance Learning. We validate the
effectiveness of our approach on low-shot sign spotting benchmarks. In
addition, we contribute a machine-readable British Sign Language (BSL)
dictionary dataset of isolated signs, BSLDict, to facilitate study of this
task. The dataset, models and code are available at our project page.
Related papers
- A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available.
Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Gloss Alignment Using Word Embeddings [40.100782464872076]
We propose a method for aligning spottings with their corresponding subtitles using large spoken language models.
We quantitatively demonstrate the effectiveness of our method on the acfmdgs and acfbobsl datasets.
arXiv Detail & Related papers (2023-08-08T13:26:53Z) - Automatic dense annotation of large-vocabulary sign language videos [85.61513254261523]
We propose a simple, scalable framework to vastly increase the density of automatic annotations.
We make these annotations publicly available to support the sign language research community.
arXiv Detail & Related papers (2022-08-04T17:55:09Z) - Sign Language Video Retrieval with Free-Form Textual Queries [19.29003565494735]
We introduce the task of sign language retrieval with free-form textual queries.
The objective is to find the signing video in the collection that best matches the written query.
We propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data.
arXiv Detail & Related papers (2022-01-07T15:22:18Z) - Read and Attend: Temporal Localisation in Sign Language Videos [84.30262812057994]
We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens.
We show that it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.
arXiv Detail & Related papers (2021-03-30T16:39:53Z) - Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
Given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.
We train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles which provide additional weak-supervision; and (3) looking up words in visual sign language dictionaries.
These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning.
arXiv Detail & Related papers (2020-10-08T14:12:56Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.