Related papers: Artificial Rigidities vs. Biological Noise: A Comparative Analysis of Multisensory Integration in AV-HuBERT and Human Observers

Artificial Rigidities vs. Biological Noise: A Comparative Analysis of Multisensory Integration in AV-HuBERT and Human Observers

URL: http://arxiv.org/abs/2601.15869v1
Date: Thu, 22 Jan 2026 11:18:16 GMT
Title: Artificial Rigidities vs. Biological Noise: A Comparative Analysis of Multisensory Integration in AV-HuBERT and Human Observers
Authors: Francisco Portillo López,
Abstract summary: This study evaluates AV-HuBERT's perceptual bio-fidelity by benchmarking it against human observers.<n>Results reveal a striking quantitative isomorphism: AI and humans exhibited nearly identical auditory dominance rates.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study evaluates AV-HuBERT's perceptual bio-fidelity by benchmarking its response to incongruent audiovisual stimuli (McGurk effect) against human observers (N=44). Results reveal a striking quantitative isomorphism: AI and humans exhibited nearly identical auditory dominance rates (32.0% vs. 31.8%), suggesting the model captures biological thresholds for auditory resistance. However, AV-HuBERT showed a deterministic bias toward phonetic fusion (68.0%), significantly exceeding human rates (47.7%). While humans displayed perceptual stochasticity and diverse error profiles, the model remained strictly categorical. Findings suggest that current self-supervised architectures mimic multisensory outcomes but lack the neural variability inherent to human speech perception.

Related papers

Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios [3.2976205772213123]
Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing)<n>Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts.<n>We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips and classified each as human or AI.
arXiv Detail & Related papers (2026-02-23T17:17:53Z)
Interpretability and Individuality in Knee MRI: Patient-Specific Radiomic Fingerprint with Reconstructed Healthy Personas [40.168029561784216]
A radiomic fingerprint is a patient-specific feature set derived from MRI.<n>A healthy persona synthesises a pathology-free baseline for each patient.<n>Comparing features extracted from pathological images against their personas highlights deviations from normal anatomy.
arXiv Detail & Related papers (2026-01-13T14:48:01Z)
EARS-UDE: Evaluating Auditory Response in Sensory Overload with Universal Differential Equations [4.285464959472458]
Auditory sensory overload affects 50-70% of individuals with Autism Spectrum Disorder (ASD)<n>We present a Scientific Machine Learning approach using Universal Differential Equations (UDEs) to model sensory adaptation dynamics in autism.
arXiv Detail & Related papers (2025-10-16T10:16:43Z)
Sense of Self and Time in Borderline Personality. A Comparative Robustness Study with Generative AI [0.0]
This study examines the capacity of large language models (LLMs) to support qualitative analysis of first-person experience in Borderline Personality Disorder (BPD)<n>Three LLMs were compared to mimic the interpretative style of the original investigators.<n>Results showed variable overlap with the human analysis, from 0 percent in GPT to 42 percent in Claude and 58 percent in Gemini, and a low Jaccard coefficient (0.21-0.28)<n> Gemini's output most closely resembled the human analysis, with validity scores significantly higher than GPT and Claude (p 0.0001), and was judged as human by blinded experts.
arXiv Detail & Related papers (2025-08-26T13:13:47Z)
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs [48.534851709853534]
We propose LMME3DHF as a metric for evaluating 3DHF capable of quality and authenticity score prediction, distortion-aware visual question answering, and distortion-aware saliency prediction.<n> Experimental results show that LMME3DHF achieves state-of-the-art performance, surpassing existing methods in both accurately predicting quality scores for AI-generated 3D human faces.
arXiv Detail & Related papers (2025-04-29T07:00:06Z)
Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology [64.46054930696052]
Adversarial attacks pose significant challenges for vision models in critical fields like healthcare.<n>Existing self-supervised adversarial training methods overlook the hierarchical structure of histopathology images.<n>We propose Hierarchical Self-Supervised Adversarial Training (HSAT), which exploits these properties to craft adversarial examples.
arXiv Detail & Related papers (2025-03-13T17:59:47Z)
Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment [37.40606157690235]
Alterations in speech and language can be early predictors of Alzheimer's disease and related dementias.<n>We evaluated machine learning techniques for ADRD screening and severity prediction from spoken language.<n>Risk stratification and linguistic feature importance analysis enhanced the interpretability and clinical utility of predictions.
arXiv Detail & Related papers (2025-01-30T20:17:17Z)
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge [51.93909886542317]
We show how *relying on a single aggregate correlation score* can obscure fundamental differences between human labels and those from automatic evaluation.<n>We propose stratifying data by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z)
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA [26.968874222330978]
This study focuses on the Visual Question Answering (VQA) task. We evaluate how well vision-language models correlate with the distribution of human responses.
arXiv Detail & Related papers (2024-09-17T13:44:25Z)
Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG [40.006249083417266]
We trained ResNet50-backbone models on a dual task of classification and EEG prediction.<n>We observed significant correlation between the networks' EEG prediction accuracy, often highest around 100 ms post stimulus onset.<n>We teased apart the data from individual EEG channels and observed strongest contribution from electrodes in the parieto-occipital regions.
arXiv Detail & Related papers (2024-09-05T16:04:57Z)
Visual Stereotypes of Autism Spectrum in Janus-Pro-7B, DALL-E, Stable Diffusion, SDXL, FLUX, and Midjourney [0.0]
This study examined whether six text-to-image models perpetuate non-rational beliefs regarding autism by comparing images generated in 2024-2025 with controls.<n>Autistic individuals were depicted with striking homogeneity in skin color (white), gender (male), and age (young), often engaged in solitary activities, interacting with objects rather than people, and exhibiting stereotypical emotional expressions such as sadness, anger, or emotional flatness.<n>We found significant differences between the models; however, with a moderate effect size, and no differences between baseline and follow-up summary values, with the ratio of stereotypical themes to the number of images similar across all
arXiv Detail & Related papers (2024-07-23T08:48:09Z)
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies. HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z)
Perceptual-Score: A Psychophysical Measure for Assessing the Biological Plausibility of Visual Recognition Models [9.902669518047714]
This article proposes a new metric, Perceptual-Score, which is grounded in visual psychophysics. We perform the procedure on twelve models that vary in degree of biological inspiration and complexity. Each model's Perceptual-Score is compared against the state-of-the-art neural activity-based metric, Brain-Score.
arXiv Detail & Related papers (2022-10-16T20:34:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.