Deep Learning for Automatic Tracking of Tongue Surface in Real-time
Ultrasound Videos, Landmarks instead of Contours
- URL: http://arxiv.org/abs/2003.08808v1
- Date: Mon, 16 Mar 2020 00:38:13 GMT
- Title: Deep Learning for Automatic Tracking of Tongue Surface in Real-time
Ultrasound Videos, Landmarks instead of Contours
- Authors: M. Hamed Mozaffari, Won-Sook Lee
- Abstract summary: This paper presents a new novel approach of automatic and real-time tongue contour tracking using deep neural networks.
In the proposed method, instead of the two-step procedure, landmarks of the tongue surface are tracked.
Our experiment disclosed the outstanding performances of the proposed technique in terms of generalization, performance, and accuracy.
- Score: 0.6853165736531939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One usage of medical ultrasound imaging is to visualize and characterize
human tongue shape and motion during a real-time speech to study healthy or
impaired speech production. Due to the low-contrast characteristic and noisy
nature of ultrasound images, it might require expertise for non-expert users to
recognize tongue gestures in applications such as visual training of a second
language. Moreover, quantitative analysis of tongue motion needs the tongue
dorsum contour to be extracted, tracked, and visualized. Manual tongue contour
extraction is a cumbersome, subjective, and error-prone task. Furthermore, it
is not a feasible solution for real-time applications. The growth of deep
learning has been vigorously exploited in various computer vision tasks,
including ultrasound tongue contour tracking. In the current methods, the
process of tongue contour extraction comprises two steps of image segmentation
and post-processing. This paper presents a new novel approach of automatic and
real-time tongue contour tracking using deep neural networks. In the proposed
method, instead of the two-step procedure, landmarks of the tongue surface are
tracked. This novel idea enables researchers in this filed to benefits from
available previously annotated databases to achieve high accuracy results. Our
experiment disclosed the outstanding performances of the proposed technique in
terms of generalization, performance, and accuracy.
Related papers
- Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition [19.34036038278796]
Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status.
Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience.
We propose a novel fully automated WeaklySupervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition.
arXiv Detail & Related papers (2024-08-29T11:31:28Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z) - Improving Ultrasound Tongue Image Reconstruction from Lip Images Using
Self-supervised Learning and Attention Mechanism [1.52292571922932]
Given an observable image sequences of lips, can we picture the corresponding tongue motion?
We formulated this problem as the self-supervised learning problem, and employ the two-stream convolutional network and long-short memory network for the learning task, with the attention mechanism.
The results show that our model is able to generate images that close to the real ultrasound tongue images, and results in the matching between two imaging modalities.
arXiv Detail & Related papers (2021-06-20T10:51:23Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Convolutional Neural Network-Based Age Estimation Using B-Mode
Ultrasound Tongue Image [10.100437437151621]
We explore the feasibility of age estimation using the ultrasound tongue image of the speakers.
Motivated by the success of deep learning, this paper leverages deep learning on this task.
The developed method can be used a tool to evaluate the performance of speech therapy sessions.
arXiv Detail & Related papers (2021-01-27T08:00:47Z) - Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery
Detection [118.37239586697139]
LipForensics is a detection approach capable of both generalising manipulations and withstanding various distortions.
It consists in first pretraining a-temporal network to perform visual speech recognition (lipreading)
A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without over-fitting to low-level, manipulation-specific artefacts.
arXiv Detail & Related papers (2020-12-14T15:53:56Z) - Ultra2Speech -- A Deep Learning Framework for Formant Frequency
Estimation and Tracking from Ultrasound Tongue Images [5.606679908174784]
This work addresses the arttory-to-acoustic mapping problem based on ultrasound (US) tongue images.
We use a novel deep learning architecture to map US tongue images from the US placed beneath a subject's chin to formants that we call, Ultrasound2Formant (U2F) Net.
arXiv Detail & Related papers (2020-06-29T20:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.