Related papers: Advances and Challenges in Deep Lip Reading

Advances and Challenges in Deep Lip Reading

URL: http://arxiv.org/abs/2110.07879v1
Date: Fri, 15 Oct 2021 06:18:26 GMT
Title: Advances and Challenges in Deep Lip Reading
Authors: Marzieh Oghbaie, Arian Sabaghi, Kooshan Hashemifard, and Mohammad Akbari
Abstract summary: This paper provides a comprehensive survey of the state-of-the-art deep learning based Visual Speech Recognition research. We focus on data challenges, task-specific complications, and the corresponding solutions. We also discuss the main modules of a VSR pipeline and the influential datasets.
Score: 2.930266486910376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition systems, other potential applications, such as biometric identification, and the promised gains of VSR systems, have motivated extensive efforts on developing the lip reading technology. This paper provides a comprehensive survey of the state-of-the-art deep learning based VSR research with a focus on data challenges, task-specific complications, and the corresponding solutions. Advancements in these directions will expedite the transformation of silent speech interface from theory to practice. We also discuss the main modules of a VSR pipeline and the influential datasets. Finally, we introduce some typical VSR application concerns and impediments to real-world scenarios as well as future research directions.

Related papers

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities [62.05713042908654]
This paper provides a review of advances in Large Language Models (LLMs) alignment through the lens of inverse reinforcement learning (IRL)<n>We highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift.
arXiv Detail & Related papers (2025-07-17T14:22:24Z)
Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation [0.0]
This systematic literature review evaluates advances in AI-driven selective noise cancellation for hearing aids.<n>We synthesize findings across deep learning architectures, hardware deployment strategies, clinical validation studies, and user-centric design.<n>Key findings include significant gains over traditional methods, with recent models achieving up to 18.3 dB SI-SDR improvement on noisy-reverberant benchmarks.
arXiv Detail & Related papers (2025-06-25T15:05:16Z)
A Survey of Deep Learning Video Super-Resolution [1.074960192271861]
Video super-resolution (VSR) is a prominent research topic in low-level computer vision.<n>Deep learning technologies have played a significant role in VSR research.
arXiv Detail & Related papers (2025-06-03T05:42:19Z)
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey [2.716339075963185]
Recent advancements in deep learning (DL) have posed a significant challenge for automatic speech recognition (ASR) ASR relies on extensive training datasets, including confidential ones, and demands substantial computational and storage resources. Advanced DL techniques like deep transfer learning (DTL), federated learning (FL), and reinforcement learning (RL) address these issues.
arXiv Detail & Related papers (2024-03-02T16:25:42Z)
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z)
AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications. We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z)
Radio Frequency Fingerprinting via Deep Learning: Challenges and Opportunities [4.800138615859937]
Radio Frequency Fingerprinting (RFF) techniques promise to authenticate wireless devices at the physical layer based on inherent hardware imperfections introduced during manufacturing. Recent advances in Machine Learning, particularly in Deep Learning (DL), have improved the ability of RFF systems to extract and learn complex features that make up the device-specific fingerprint. This paper systematically identifies and analyzes the essential considerations and challenges encountered in the creation of DL-based RFF systems.
arXiv Detail & Related papers (2023-10-25T06:45:49Z)
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping [4.271091833712731]
We propose a simple approach, named Lip2Vec that is based on learning a prior model. The proposed model compares favorably with fully-supervised learning methods on the LRS3 dataset achieving 26 WER. We believe that reprogramming the VSR as an ASR task narrows the performance gap between the two and paves the way for more flexible formulations of lip reading.
arXiv Detail & Related papers (2023-08-11T12:59:02Z)
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey [0.0]
Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023.
arXiv Detail & Related papers (2023-06-14T07:33:43Z)
NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos. We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics. Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z)
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition [66.94463981654216]
We propose prompt tuning methods of Deep Neural Networks (DNNs) for speaker-adaptive Visual Speech Recognition (VSR) We finetune prompts on adaptation data of target speakers instead of modifying the pre-trained model parameters. The effectiveness of the proposed method is evaluated on both word- and sentence-level VSR databases.
arXiv Detail & Related papers (2023-02-16T06:01:31Z)
Visualizing Automatic Speech Recognition -- Means for a Better Understanding? [0.1868368163807795]
We show how attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR. Taking Speech Deep, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output.
arXiv Detail & Related papers (2022-02-01T13:35:08Z)
Deep Recurrent Encoder: A scalable end-to-end network to model brain signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once. We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z)
Video Super Resolution Based on Deep Learning: A Comprehensive Survey [87.30395002197344]
We comprehensively investigate 33 state-of-the-art video super-resolution (VSR) methods based on deep learning. We propose a taxonomy and classify the methods into six sub-categories according to the ways of utilizing inter-frame information. We summarize and compare the performance of the representative VSR method on some benchmark datasets.
arXiv Detail & Related papers (2020-07-25T13:39:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.