Lip-reading with Densely Connected Temporal Convolutional Networks
- URL: http://arxiv.org/abs/2009.14233v3
- Date: Thu, 29 Sep 2022 14:50:00 GMT
- Title: Lip-reading with Densely Connected Temporal Convolutional Networks
- Authors: Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic
- Abstract summary: We present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.
Our method has achieved 88.36% accuracy on the Lip Reading in the Wild dataset and 43.65% on the LRW-1000 dataset.
- Score: 61.66144695679362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present the Densely Connected Temporal Convolutional Network
(DC-TCN) for lip-reading of isolated words. Although Temporal Convolutional
Networks (TCN) have recently demonstrated great potential in many vision tasks,
its receptive fields are not dense enough to model the complex temporal
dynamics in lip-reading scenarios. To address this problem, we introduce dense
connections into the network to capture more robust temporal features.
Moreover, our approach utilises the Squeeze-and-Excitation block, a
light-weight attention mechanism, to further enhance the model's classification
power. Without bells and whistles, our DC-TCN method has achieved 88.36%
accuracy on the Lip Reading in the Wild (LRW) dataset and 43.65% on the
LRW-1000 dataset, which has surpassed all the baseline methods and is the new
state-of-the-art on both datasets.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Utterance Weighted Multi-Dilation Temporal Convolutional Networks for
Monaural Speech Dereverberation [26.94528951545861]
A weighted multi-dilation depthwise-separable convolution is proposed to replace standard depthwise-separable convolutions in temporal convolutional networks (TCNs)
It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations.
arXiv Detail & Related papers (2022-05-17T15:56:31Z) - Hybrid Backpropagation Parallel Reservoir Networks [8.944918753413827]
We propose a novel hybrid network, which combines the effectiveness of learning random temporal features of reservoirs with the readout power of a deep neural network with batch normalization.
We demonstrate that our new network outperforms LSTMs and GRUs, including multi-layer "deep" versions of these networks.
We show also that the inclusion of a novel meta-ring structure, which we call HBP-ESN M-Ring, achieves similar performance to one large reservoir while decreasing the memory required by an order of magnitude.
arXiv Detail & Related papers (2020-10-27T21:03:35Z) - Depth Enables Long-Term Memory for Recurrent Neural Networks [0.0]
We introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank.
We prove that deep recurrent networks support Start-End separation ranks which are higher than those supported by their shallow counterparts.
arXiv Detail & Related papers (2020-03-23T10:29:14Z) - Toward fast and accurate human pose estimation via soft-gated skip
connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation.
We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art.
Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z) - Single Channel Speech Enhancement Using Temporal Convolutional Recurrent
Neural Networks [23.88788382262305]
temporal convolutional recurrent network (TCRN) is an end-to-end model that directly map noisy waveform to clean waveform.
We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks.
arXiv Detail & Related papers (2020-02-02T04:26:50Z) - Lipreading using Temporal Convolutional Networks [57.41253104365274]
Current model for recognition of isolated words in-the-wild consists of a residual network and Bi-directional Gated Recurrent Unit layers.
We address the limitations of this model and we propose changes which further improve its performance.
Our proposed model results in an absolute improvement of 1.2% and 3.2%, respectively, in these datasets.
arXiv Detail & Related papers (2020-01-23T17:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.