Towards Zero-shot Sign Language Recognition
- URL: http://arxiv.org/abs/2201.05914v1
- Date: Sat, 15 Jan 2022 19:26:36 GMT
- Title: Towards Zero-shot Sign Language Recognition
- Authors: Yunus Can Bilge, Ramazan Gokberk Cinbis, Nazli Ikizler-Cinbis
- Abstract summary: This paper tackles the problem of zero-shot sign language recognition.
The goal is to leverage models learned over the seen sign classes to recognize the instances of unseen sign classes.
- Score: 11.952300437658703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the problem of zero-shot sign language recognition
(ZSSLR), where the goal is to leverage models learned over the seen sign
classes to recognize the instances of unseen sign classes. In this context,
readily available textual sign descriptions and attributes collected from sign
language dictionaries are utilized as semantic class representations for
knowledge transfer. For this novel problem setup, we introduce three benchmark
datasets with their accompanying textual and attribute descriptions to analyze
the problem in detail. Our proposed approach builds spatiotemporal models of
body and hand regions. By leveraging the descriptive text and attribute
embeddings along with these visual representations within a zero-shot learning
framework, we show that textual and attribute based class definitions can
provide effective knowledge for the recognition of previously unseen sign
classes. We additionally introduce techniques to analyze the influence of
binary attributes in correct and incorrect zero-shot predictions. We anticipate
that the introduced approaches and the accompanying datasets will provide a
basis for further exploration of zero-shot learning in sign language
recognition.
Related papers
- Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions.
We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model.
We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.