Related papers: Fully Convolutional Networks for Continuous Sign Language Recognition

Fully Convolutional Networks for Continuous Sign Language Recognition

URL: http://arxiv.org/abs/2007.12402v1
Date: Fri, 24 Jul 2020 08:16:37 GMT
Title: Fully Convolutional Networks for Continuous Sign Language Recognition
Authors: Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai
Abstract summary: Continuous sign language recognition is a challenging task that requires learning on both spatial and temporal dimensions. We propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences.
Score: 83.85895472824221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences. Most recent work accomplishes this by using CNN and RNN hybrid networks. However, training these networks is generally non-trivial, and most of them fail in learning unseen sequence patterns, causing an unsatisfactory performance for online recognition. In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given. A gloss feature enhancement (GFE) module is introduced in the proposed network to enforce better sequence alignment learning. The proposed network is end-to-end trainable without any pre-training. We conduct experiments on two large scale SLR datasets. Experiments show that our method for continuous SLR is effective and performs well in online recognition.

Related papers

Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z)
Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios. The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Temporal superimposed crossover module for effective continuous sign language [10.920363368754721]
This paper proposes a zero parameter, zero temporal superposition crossover module(TSCM), and combines it with 2D convolution to form a "TSCM+2D convolution" hybrid convolution. Experiments on two large-scale continuous sign language datasets demonstrate the effectiveness of the proposed method and achieve highly competitive results.
arXiv Detail & Related papers (2022-11-07T09:33:42Z)
Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID) Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space. Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z)
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks [10.014879130837912]
We propose a symmetric multi-scale architecture called Circular Dilated Convolutional Neural Network (CDIL-CNN) Our model gives classification logits in all positions, and we can apply a simple ensemble learning to achieve a better decision.
arXiv Detail & Related papers (2022-01-06T16:58:59Z)
Self-Supervised Learning for Binary Networks by Joint Classifier Training [11.612308609123566]
We propose a self-supervised learning method for binary networks. For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training. Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z)
Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate. We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR) Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.