Fully Convolutional Networks for Continuous Sign Language Recognition
- URL: http://arxiv.org/abs/2007.12402v1
- Date: Fri, 24 Jul 2020 08:16:37 GMT
- Title: Fully Convolutional Networks for Continuous Sign Language Recognition
- Authors: Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai
- Abstract summary: Continuous sign language recognition is a challenging task that requires learning on both spatial and temporal dimensions.
We propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences.
- Score: 83.85895472824221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous sign language recognition (SLR) is a challenging task that
requires learning on both spatial and temporal dimensions of signing frame
sequences. Most recent work accomplishes this by using CNN and RNN hybrid
networks. However, training these networks is generally non-trivial, and most
of them fail in learning unseen sequence patterns, causing an unsatisfactory
performance for online recognition. In this paper, we propose a fully
convolutional network (FCN) for online SLR to concurrently learn spatial and
temporal features from weakly annotated video sequences with only
sentence-level annotations given. A gloss feature enhancement (GFE) module is
introduced in the proposed network to enforce better sequence alignment
learning. The proposed network is end-to-end trainable without any
pre-training. We conduct experiments on two large scale SLR datasets.
Experiments show that our method for continuous SLR is effective and performs
well in online recognition.
Related papers
- Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms.
Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks.
Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z) - Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks.
It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios.
The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Temporal superimposed crossover module for effective continuous sign
language [10.920363368754721]
This paper proposes a zero parameter, zero temporal superposition crossover module(TSCM), and combines it with 2D convolution to form a "TSCM+2D convolution" hybrid convolution.
Experiments on two large-scale continuous sign language datasets demonstrate the effectiveness of the proposed method and achieve highly competitive results.
arXiv Detail & Related papers (2022-11-07T09:33:42Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - Classification of Long Sequential Data using Circular Dilated
Convolutional Neural Networks [10.014879130837912]
We propose a symmetric multi-scale architecture called Circular Dilated Convolutional Neural Network (CDIL-CNN)
Our model gives classification logits in all positions, and we can apply a simple ensemble learning to achieve a better decision.
arXiv Detail & Related papers (2022-01-06T16:58:59Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.