Can you hear me $\textit{now}$? Sensitive comparisons of human and
machine perception
- URL: http://arxiv.org/abs/2003.12362v2
- Date: Wed, 3 Aug 2022 01:55:10 GMT
- Title: Can you hear me $\textit{now}$? Sensitive comparisons of human and
machine perception
- Authors: Michael A Lepori and Chaz Firestone
- Abstract summary: We explore how this asymmetry can cause comparisons to misestimate the overlap in human and machine perception.
In five experiments, we adapt task designs from the human psychophysics literature to show that even when subjects cannot freely transcribe such speech commands, they often can demonstrate other forms of understanding.
We recommend the adoption of such "sensitive tests" when comparing human and machine perception.
- Score: 3.8580784887142774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of machine-learning systems that process sensory input has brought
with it a rise in comparisons between human and machine perception. But such
comparisons face a challenge: Whereas machine perception of some stimulus can
often be probed through direct and explicit measures, much of human perceptual
knowledge is latent, incomplete, or unavailable for explicit report. Here, we
explore how this asymmetry can cause such comparisons to misestimate the
overlap in human and machine perception. As a case study, we consider human
perception of \textit{adversarial speech} -- synthetic audio commands that are
recognized as valid messages by automated speech-recognition systems but that
human listeners reportedly hear as meaningless noise. In five experiments, we
adapt task designs from the human psychophysics literature to show that even
when subjects cannot freely transcribe such speech commands (the previous
benchmark for human understanding), they often can demonstrate other forms of
understanding, including discriminating adversarial speech from closely matched
non-speech (Experiments 1--2), finishing common phrases begun in adversarial
speech (Experiments 3--4), and solving simple math problems posed in
adversarial speech (Experiment 5) -- even for stimuli previously described as
unintelligible to human listeners. We recommend the adoption of such "sensitive
tests" when comparing human and machine perception, and we discuss the broader
consequences of such approaches for assessing the overlap between systems.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation [0.6964027823688135]
Modern conversational systems lack emotional depth and disfluent characteristic of human interactions.
To address this shortcoming, we have designed an innovative speech synthesis pipeline.
Within this framework, a cutting-edge language model introduces both human-like emotion and disfluencies in a zero-shot setting.
arXiv Detail & Related papers (2024-03-31T00:38:02Z) - Hearing Loss Detection from Facial Expressions in One-on-one
Conversations [20.12296163231457]
Individuals with impaired hearing experience difficulty in conversations, especially in noisy environments.
This difficulty often manifests as a change in behavior and may be captured via facial expressions, such as the expression of discomfort or fatigue.
We build on this idea and introduce the problem of detecting hearing loss from an individual's facial expressions during a conversation.
arXiv Detail & Related papers (2024-01-17T04:52:32Z) - Robots-Dont-Cry: Understanding Falsely Anthropomorphic Utterances in
Dialog Systems [64.10696852552103]
Highly anthropomorphic responses might make users uncomfortable or implicitly deceive them into thinking they are interacting with a human.
We collect human ratings on the feasibility of approximately 900 two-turn dialogs sampled from 9 diverse data sources.
arXiv Detail & Related papers (2022-10-22T12:10:44Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Exemplars-guided Empathetic Response Generation Controlled by the
Elements of Human Communication [88.52901763928045]
We propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor.
We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics.
arXiv Detail & Related papers (2021-06-22T14:02:33Z) - Dompteur: Taming Audio Adversarial Examples [28.54699912239861]
Adversarial examples allow attackers to arbitrarily manipulate machine learning systems.
In this paper we propose a different perspective: We accept the presence of adversarial examples against ASR systems, but we require them to be perceivable by human listeners.
By applying the principles of psychoacoustics, we can remove semantically irrelevant information from the ASR input and train a model that resembles human perception more closely.
arXiv Detail & Related papers (2021-02-10T13:53:32Z) - Predicting Emotions Perceived from Sounds [2.9398911304923447]
Sonification is the science of communication of data and events to users through sounds.
This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed.
It is possible to predict perceived emotions with high accuracy.
arXiv Detail & Related papers (2020-12-04T15:01:59Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z) - On the human evaluation of audio adversarial examples [1.7006003864727404]
adversarial examples are inputs intentionally perturbed to produce a wrong prediction without being noticed.
High fooling rates of proposed adversarial perturbation strategies are only valuable if the perturbations are not detectable.
We demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain.
arXiv Detail & Related papers (2020-01-23T10:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.