A Chinese Continuous Sign Language Dataset Based on Complex Environments
- URL: http://arxiv.org/abs/2409.11960v1
- Date: Wed, 18 Sep 2024 13:11:15 GMT
- Title: A Chinese Continuous Sign Language Dataset Based on Complex Environments
- Authors: Qidan Zhu, Jing Li, Fei Yuan, Jiaojiao Fan, Quan Gan,
- Abstract summary: We have constructed a large-scale dataset for Chinese continuous sign language (CSL) based on complex environments.
This dataset encompasses 5,988 continuous CSL video clips collected from daily life scenes.
We propose a time-frequency network (TFNet) model for continuous sign language recognition.
- Score: 17.195286118443256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current bottleneck in continuous sign language recognition (CSLR) research lies in the fact that most publicly available datasets are limited to laboratory environments or television program recordings, resulting in a single background environment with uniform lighting, which significantly deviates from the diversity and complexity found in real-life scenarios. To address this challenge, we have constructed a new, large-scale dataset for Chinese continuous sign language (CSL) based on complex environments, termed the complex environment - chinese sign language dataset (CE-CSL). This dataset encompasses 5,988 continuous CSL video clips collected from daily life scenes, featuring more than 70 different complex backgrounds to ensure representativeness and generalization capability. To tackle the impact of complex backgrounds on CSLR performance, we propose a time-frequency network (TFNet) model for continuous sign language recognition. This model extracts frame-level features and then utilizes both temporal and spectral information to separately derive sequence features before fusion, aiming to achieve efficient and accurate CSLR. Experimental results demonstrate that our approach achieves significant performance improvements on the CE-CSL, validating its effectiveness under complex background conditions. Additionally, our proposed method has also yielded highly competitive results when applied to three publicly available CSL datasets.
Related papers
- SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information.
Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information.
We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z) - Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model [0.5825410941577593]
We propose a spatial-temporal attention-based BSL recognition model considering hand joint skeletons extracted from the sequence of images.
Our model captures discriminative structural displacements and short-range dependency based on unified joint features projected onto high-dimensional feature space.
arXiv Detail & Related papers (2024-08-26T08:55:16Z) - COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning [37.843051974342124]
We introduce COIG-CQIA, a new Chinese instruction tuning dataset derived from various real-world resources and undergoing rigorous human verification.
We conduct extensive experiments on COIG-CQIA, and compare them with strong baseline models and datasets.
The experimental results show that models trained on COIG-CQIA achieve highly competitive performance in diverse benchmarks.
arXiv Detail & Related papers (2024-03-26T19:24:18Z) - Embracing Language Inclusivity and Diversity in CLIP through Continual
Language Learning [58.92843729869586]
Vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, but their mastery in a few languages like English restricts their applicability in broader communities.
We propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF)
We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance.
arXiv Detail & Related papers (2024-01-30T17:14:05Z) - SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by
Visual-Textual Contrastive Learning [51.800031281177105]
SignVTCL is a continuous sign language recognition framework enhanced by visual-textual contrastive learning.
It integrates multi-modal data (video, keypoints, and optical flow) simultaneously to train a unified visual backbone.
It achieves state-of-the-art results compared with previous methods.
arXiv Detail & Related papers (2024-01-22T11:04:55Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - Global and Local Semantic Completion Learning for Vision-Language
Pre-training [34.740507502215536]
Cross-modal alignment plays a crucial role in vision-language pre-training models.
We propose a novel Global and Local Semantic Completion Learning (GLSCL) task to facilitate global-local alignment and local-local alignment simultaneously.
arXiv Detail & Related papers (2023-06-12T13:20:29Z) - CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization [25.182666420286132]
Given the rareness of naturally occurring CLS resources, the majority of datasets are forced to rely on translation.
This restricts our ability to observe naturally occurring CLS pairs that capture organic diction, including instances of code-switching.
We introduce CroCoSum, a dataset of cross-lingual code-switched summarization of technology news.
arXiv Detail & Related papers (2023-03-07T17:52:51Z) - Signing Outside the Studio: Benchmarking Background Robustness for
Continuous Sign Language Recognition [79.23777980180755]
We propose a pipeline to automatically generate a benchmark dataset utilizing existing Continuous Sign Language Recognition benchmarks.
Our newly constructed benchmark dataset consists of diverse scenes to simulate a real-world environment.
In this regard, we also propose a simple yet effective training scheme including (1) background randomization and (2) feature disentanglement for CSLR models.
arXiv Detail & Related papers (2022-11-01T13:27:44Z) - Spatial-Temporal Multi-Cue Network for Continuous Sign Language
Recognition [141.24314054768922]
We propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem.
To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks.
arXiv Detail & Related papers (2020-02-08T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.