Related papers: Successes and critical failures of neural networks in capturing human-like speech recognition

Successes and critical failures of neural networks in capturing human-like speech recognition

URL: http://arxiv.org/abs/2204.03740v4
Date: Wed, 19 Apr 2023 12:12:17 GMT
Title: Successes and critical failures of neural networks in capturing human-like speech recognition
Authors: Federico Adolfi, Jeffrey S. Bowers, David Poeppel
Abstract summary: Speech recognition is inherently robust in humans to a number transformations at various spectrotemporal granularities. We evaluate state-of-the-art neural networks as stimulus-computable, optimized observers.
Score: 1.1602089225841632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.

Related papers

Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence [7.742102806887099]
Despite remarkable capabilities, artificial neural networks exhibit limited flexible, generalizable intelligence.<n>This limitation stems from their fundamental divergence from biological cognition that overlooks both neural regions' functional specialization and the temporal dynamics critical for coordinating these specialized systems.<n>We propose a tripartite brain-inspired architecture comprising functionally specialized perceptual, auxiliary, and executive systems.
arXiv Detail & Related papers (2025-08-04T08:40:33Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks. We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z)
Exploring mechanisms of Neural Robustness: probing the bridge between geometry and spectrum [0.0]
We study the link between representation smoothness and spectrum by using weight, Jacobian and spectral regularization. Our research aims to understand the interplay between geometry, spectral properties, robustness, and expressivity in neural representations.
arXiv Detail & Related papers (2024-02-05T12:06:00Z)
Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology. We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors. The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z)
A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization [55.11642177631929]
Large neural generative models are capable of synthesizing semantically rich passages of text or producing complex images. We discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition.
arXiv Detail & Related papers (2023-10-14T23:28:48Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
Predictive Coding and Stochastic Resonance: Towards a Unified Theory of Auditory (Phantom) Perception [6.416574036611064]
To gain a mechanistic understanding of brain function, hypothesis driven experiments should be accompanied by biologically plausible computational models. With a special focus on tinnitus, we review recent work at the intersection of artificial intelligence, psychology, and neuroscience. We conclude that two fundamental processing principles - being ubiquitous in the brain - best fit to a vast number of experimental results.
arXiv Detail & Related papers (2022-04-07T10:47:58Z)
The world seems different in a social context: a neural network analysis of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals. An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z)
Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z)
Understanding Information Processing in Human Brain by Interpreting Machine Learning Models [1.14219428942199]
The thesis explores the role machine learning methods play in creating intuitive computational models of neural processing. This perspective makes the case in favor of the larger role that exploratory and data-driven approach to computational neuroscience could play.
arXiv Detail & Related papers (2020-10-17T04:37:26Z)
Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI) This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z)
Bio-Inspired Modality Fusion for Active Speaker Detection [1.0644456464343592]
This paper presents a methodology for fusing correlated auditory and visual information for active speaker detection. The ability can have a wide range of applications, from teleconferencing systems to social robotics.
arXiv Detail & Related papers (2020-02-28T20:56:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.