Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners
- URL: http://arxiv.org/abs/2504.05857v1
- Date: Tue, 08 Apr 2025 09:35:46 GMT
- Title: Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners
- Authors: Saad Hassan, Matyas Bohacek, Chaelin Kim, Denise Crochet,
- Abstract summary: Video-based dictionaries allow users to submit a video and receive a list of the closest matching signs.<n>We present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks.<n>Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns.
- Score: 3.4030882631756025
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Searching for unfamiliar American Sign Language (ASL) signs is challenging for learners because, unlike spoken languages, they cannot type a text-based query to look up an unfamiliar sign. Advances in isolated sign recognition have enabled the creation of video-based dictionaries, allowing users to submit a video and receive a list of the closest matching signs. Previous HCI research using Wizard-of-Oz prototypes has explored interface designs for ASL dictionaries. Building on these studies, we incorporate their design recommendations and leverage state-of-the-art sign-recognition technology to develop an automated video-based dictionary. We also present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks. Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns. These insights offer guidance for designing and deploying video-based ASL dictionary systems.
Related papers
- Generating Signed Language Instructions in Large-Scale Dialogue Systems [25.585339304165466]
We introduce a goal-oriented conversational AI system enhanced with American Sign Language (ASL) instructions.
Our system receives input from users and seamlessly generates ASL instructions by leveraging retrieval methods and cognitively based gloss translations.
arXiv Detail & Related papers (2024-10-17T20:56:29Z) - Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - SLVideo: A Sign Language Video Moment Retrieval Framework [6.782143030167946]
SLVideo is a video moment retrieval system for Sign Language videos.
It extracts embedding representations for the hand and face signs from video frames to capture the signs in their entirety.
A collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset.
arXiv Detail & Related papers (2024-07-22T14:29:36Z) - New Capability to Look Up an ASL Sign from a Video Example [4.992008196032313]
We describe a new system, publicly shared on the Web, to enable lookup of a video of an ASL sign.
The user submits a video for analysis and is presented with the five most likely sign matches.
This video lookup is also integrated into our newest version of SignStream software to facilitate linguistic annotation of ASL video data.
arXiv Detail & Related papers (2024-07-18T15:14:35Z) - DiffSLVA: Harnessing Diffusion Models for Sign Language Video
Anonymization [33.18321022815901]
We introduce DiffSLVA, a novel methodology for text-guided sign language video anonymization.
We develop a specialized module dedicated to capturing facial expressions, which are critical for conveying linguistic information in signed languages.
This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications.
arXiv Detail & Related papers (2023-11-27T18:26:19Z) - ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign
Language Recognition [6.296362537531586]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide.
To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition dataset.
We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary.
arXiv Detail & Related papers (2023-04-12T15:52:53Z) - Weakly-supervised Fingerspelling Recognition in British Sign Language
Videos [85.61513254261523]
Previous fingerspelling recognition methods have not focused on British Sign Language (BSL)
In contrast to previous methods, our method only uses weak annotations from subtitles for training.
We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities.
arXiv Detail & Related papers (2022-11-16T15:02:36Z) - Sign Language Video Retrieval with Free-Form Textual Queries [19.29003565494735]
We introduce the task of sign language retrieval with free-form textual queries.
The objective is to find the signing video in the collection that best matches the written query.
We propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data.
arXiv Detail & Related papers (2022-01-07T15:22:18Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
Given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.
We train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles which provide additional weak-supervision; and (3) looking up words in visual sign language dictionaries.
These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning.
arXiv Detail & Related papers (2020-10-08T14:12:56Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.