Searching for fingerspelled content in American Sign Language
- URL: http://arxiv.org/abs/2203.13291v1
- Date: Thu, 24 Mar 2022 18:36:22 GMT
- Title: Searching for fingerspelled content in American Sign Language
- Authors: Bowen Shi and Diane Brentari and Greg Shakhnarovich and Karen Livescu
- Abstract summary: Natural language processing for sign language video is crucial for making artificial intelligence technologies accessible to deaf individuals.
In this paper, we address the problem of searching for fingerspelled key-words or key phrases in raw sign language videos.
We propose an end-to-end model for this task, FSS-Net, that jointly detects fingerspelling and matches it to a text sequence.
- Score: 32.89182994277633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing for sign language video - including tasks like
recognition, translation, and search - is crucial for making artificial
intelligence technologies accessible to deaf individuals, and is gaining
research interest in recent years. In this paper, we address the problem of
searching for fingerspelled key-words or key phrases in raw sign language
videos. This is an important task since significant content in sign language is
often conveyed via fingerspelling, and to our knowledge the task has not been
studied before. We propose an end-to-end model for this task, FSS-Net, that
jointly detects fingerspelling and matches it to a text sequence. Our
experiments, done on a large public dataset of ASL fingerspelling in the wild,
show the importance of fingerspelling detection as a component of a search and
retrieval model. Our model significantly outperforms baseline methods adapted
from prior work on related tasks
Related papers
- Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - Fingerspelling within Sign Language Translation [0.9790236766474201]
Fingerspelling poses challenges for sign language processing due to its high-frequency motion and use for open-vocabulary terms.
We evaluate how well sign language translation models understand fingerspelling in the context of entire sentences.
arXiv Detail & Related papers (2024-08-13T17:57:14Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Toward American Sign Language Processing in the Real World: Data, Tasks,
and Methods [15.77894358993113]
I study automatic sign language processing in the wild, using signing videos collected from the Internet.
I present three new large-scale ASL datasets in the wild: ChicagoFSWild, ChicagoFSWild+, and OpenASL.
I propose two tasks for building real-world fingerspelling-based applications: fingerspelling detection and search.
arXiv Detail & Related papers (2023-08-23T20:38:19Z) - Weakly-supervised Fingerspelling Recognition in British Sign Language
Videos [85.61513254261523]
Previous fingerspelling recognition methods have not focused on British Sign Language (BSL)
In contrast to previous methods, our method only uses weak annotations from subtitles for training.
We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities.
arXiv Detail & Related papers (2022-11-16T15:02:36Z) - Sign Language Video Retrieval with Free-Form Textual Queries [19.29003565494735]
We introduce the task of sign language retrieval with free-form textual queries.
The objective is to find the signing video in the collection that best matches the written query.
We propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data.
arXiv Detail & Related papers (2022-01-07T15:22:18Z) - A Fine-Grained Visual Attention Approach for Fingerspelling Recognition
in the Wild [17.8181080354116]
Automatic recognition of fingerspelling can help resolve communication barriers when interacting with deaf people.
Main challenges prevalent in fingerspelling recognition are the ambiguity in the gestures and strong articulation of the hands.
We propose a fine-grained visual attention mechanism using the Transformer model for the sequence-to-sequence prediction task in the wild dataset.
arXiv Detail & Related papers (2021-05-17T06:15:35Z) - Fingerspelling Detection in American Sign Language [32.79935314131377]
We consider the task of fingerspelling detection in raw, untrimmed sign language videos.
This is an important step towards building real-world fingerspelling recognition systems.
We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.
arXiv Detail & Related papers (2021-04-03T02:11:09Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.