Hearing Loss Detection from Facial Expressions in One-on-one
Conversations
- URL: http://arxiv.org/abs/2401.08972v1
- Date: Wed, 17 Jan 2024 04:52:32 GMT
- Title: Hearing Loss Detection from Facial Expressions in One-on-one
Conversations
- Authors: Yufeng Yin, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Stavros
Petridis, Yu-Hsiang Wu, Christi Miller
- Abstract summary: Individuals with impaired hearing experience difficulty in conversations, especially in noisy environments.
This difficulty often manifests as a change in behavior and may be captured via facial expressions, such as the expression of discomfort or fatigue.
We build on this idea and introduce the problem of detecting hearing loss from an individual's facial expressions during a conversation.
- Score: 20.12296163231457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Individuals with impaired hearing experience difficulty in conversations,
especially in noisy environments. This difficulty often manifests as a change
in behavior and may be captured via facial expressions, such as the expression
of discomfort or fatigue. In this work, we build on this idea and introduce the
problem of detecting hearing loss from an individual's facial expressions
during a conversation. Building machine learning models that can represent
hearing-related facial expression changes is a challenge. In addition, models
need to disentangle spurious age-related correlations from hearing-driven
expressions. To this end, we propose a self-supervised pre-training strategy
tailored for the modeling of expression variations. We also use adversarial
representation learning to mitigate the age bias. We evaluate our approach on a
large-scale egocentric dataset with real-world conversational scenarios
involving subjects with hearing loss and show that our method for hearing loss
detection achieves superior performance over baselines.
Related papers
- Selfsupervised learning for pathological speech detection [0.0]
Speech production is susceptible to influence and disruption by various neurodegenerative pathological speech disorders.
These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation.
Unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc.
arXiv Detail & Related papers (2024-05-16T07:12:47Z) - A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation [51.53917938874146]
We propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction.
Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance.
arXiv Detail & Related papers (2024-04-04T14:45:26Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication.
We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Sources of Noise in Dialogue and How to Deal with Them [63.02707014103651]
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs.
Despite their prevalence, there currently lacks an accurate survey of dialogue noise.
This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems.
arXiv Detail & Related papers (2022-12-06T04:36:32Z) - I Only Have Eyes for You: The Impact of Masks On Convolutional-Based
Facial Expression Recognition [78.07239208222599]
We evaluate how the recently proposed FaceChannel adapts towards recognizing facial expressions from persons with masks.
We also perform specific feature-level visualization to demonstrate how the inherent capabilities of the FaceChannel to learn and combine facial features change when in a constrained social interaction scenario.
arXiv Detail & Related papers (2021-04-16T20:03:30Z) - Can you hear me $\textit{now}$? Sensitive comparisons of human and
machine perception [3.8580784887142774]
We explore how this asymmetry can cause comparisons to misestimate the overlap in human and machine perception.
In five experiments, we adapt task designs from the human psychophysics literature to show that even when subjects cannot freely transcribe such speech commands, they often can demonstrate other forms of understanding.
We recommend the adoption of such "sensitive tests" when comparing human and machine perception.
arXiv Detail & Related papers (2020-03-27T16:24:08Z) - On the human evaluation of audio adversarial examples [1.7006003864727404]
adversarial examples are inputs intentionally perturbed to produce a wrong prediction without being noticed.
High fooling rates of proposed adversarial perturbation strategies are only valuable if the perturbations are not detectable.
We demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain.
arXiv Detail & Related papers (2020-01-23T10:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.