Related papers: How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

URL: http://arxiv.org/abs/2008.08143v2
Date: Thu, 1 Apr 2021 16:54:42 GMT
Title: How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Authors: Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres and Xavier Giro-i-Nieto
Abstract summary: How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset. It consists of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was recorded in the Panoptic studio enabling detailed 3D pose estimation.
Score: 37.578776156503906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the factors that have hindered progress in the areas of sign language recognition, translation, and production is the absence of large annotated datasets. Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation. To evaluate the potential of How2Sign for real-world impact, we conduct a study with ASL signers and show that synthesized videos using our dataset can indeed be understood. The study further gives insights on challenges that computer vision should address in order to make progress in this field. Dataset website: http://how2sign.github.io/

Related papers

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment [84.39962912136525]
We develop a model for sign language understanding that performs sign language translation (SLT) and sign-subtitle alignment (SSA)<n>Our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA.
arXiv Detail & Related papers (2025-12-08T21:05:46Z)
Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z)
iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing. We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z)
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning [51.800031281177105]
SignVTCL is a continuous sign language recognition framework enhanced by visual-textual contrastive learning. It integrates multi-modal data (video, keypoints, and optical flow) simultaneously to train a unified visual backbone. It achieves state-of-the-art results compared with previous methods.
arXiv Detail & Related papers (2024-01-22T11:04:55Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition. We first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z)
Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z)
LSA-T: The first continuous Argentinian Sign Language dataset for Sign Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning. This paper presents the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z)
SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language [0.0]
We release the first version of our ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours. This is the largest continuous sign language dataset published to date in terms of video duration.
arXiv Detail & Related papers (2022-10-13T07:08:00Z)
ASL-Homework-RGBD Dataset: An annotated dataset of 45 fluent and non-fluent signers performing American Sign Language homeworks [32.3809065803553]
This dataset contains videos of fluent and non-fluent signers using American Sign Language (ASL) A total of 45 fluent and non-fluent participants were asked to perform signing homework assignments. The data is annotated to identify several aspects of signing including grammatical features and non-manual markers.
arXiv Detail & Related papers (2022-07-08T17:18:49Z)
Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
Given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. We train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles which provide additional weak-supervision; and (3) looking up words in visual sign language dictionaries. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning.
arXiv Detail & Related papers (2020-10-08T14:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.