Sign language segmentation with temporal convolutional networks
- URL: http://arxiv.org/abs/2011.12986v2
- Date: Fri, 12 Feb 2021 17:16:41 GMT
- Title: Sign language segmentation with temporal convolutional networks
- Authors: Katrin Renz, Nicolaj C. Stache, Samuel Albanie, G\"ul Varol
- Abstract summary: Our approach employs 3D convolutional neural network representations with iterative temporal segment refinement to resolve ambiguities between sign boundary cues.
We demonstrate the effectiveness of our approach on the BSLCORPUS, PHOENIX14 and BSL-1K datasets.
- Score: 25.661006537351547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of this work is to determine the location of temporal
boundaries between signs in continuous sign language videos. Our approach
employs 3D convolutional neural network representations with iterative temporal
segment refinement to resolve ambiguities between sign boundary cues. We
demonstrate the effectiveness of our approach on the BSLCORPUS, PHOENIX14 and
BSL-1K datasets, showing considerable improvement over the prior state of the
art and the ability to generalise to new signers, languages and domains.
Related papers
- MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - Continuous Sign Language Recognition with Correlation Network [6.428695655854854]
We propose correlation network (CorrNet) to explicitly capture and leverage body trajectories across frames to identify signs.
CorrNet achieves new state-of-the-art accuracy on four large-scale datasets.
arXiv Detail & Related papers (2023-03-06T15:02:12Z) - BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization [135.73436686653315]
We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model.
Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone.
Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence.
It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
arXiv Detail & Related papers (2023-02-10T06:23:44Z) - Sign Segmentation with Changepoint-Modulated Pseudo-Labelling [12.685780222519902]
The objective of this work is to find temporal boundaries between signs in continuous sign language.
Motivated by the paucity of annotation available for this task, we propose a simple yet effective algorithm to improve segmentation performance.
arXiv Detail & Related papers (2021-04-28T15:05:19Z) - Pose-based Sign Language Recognition using GCN and BERT [0.0]
Word-level sign language recognition (WSLR) is the first important step towards understanding and interpreting sign language.
recognizing signs from videos is a challenging task as the meaning of a word depends on a combination of subtle body motions, hand configurations, and other movements.
Recent pose-based architectures for W SLR either model both the spatial and temporal dependencies among the poses in different frames simultaneously or only model the temporal information without fully utilizing the spatial information.
We tackle the problem of W SLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion.
arXiv Detail & Related papers (2020-12-01T19:10:50Z) - TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
Sign Language Translation [101.6042317204022]
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
Existing SLT models usually represent sign visual features in a frame-wise manner.
We develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet.
arXiv Detail & Related papers (2020-10-12T05:58:09Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.