Related papers: Exploring How Audio Effects Alter Emotion with Foundation Models

Exploring How Audio Effects Alter Emotion with Foundation Models

URL: http://arxiv.org/abs/2509.15151v2
Date: Sat, 20 Sep 2025 08:36:11 GMT
Title: Exploring How Audio Effects Alter Emotion with Foundation Models
Authors: Stelios Katsis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou,
Abstract summary: Audio effects (FX) play a pivotal role in shaping emotional responses during music listening.<n>This work investigates how foundation models can be leveraged to analyze these effects.<n>Our findings aim to advance understanding of the perceptual impact of audio production practices, with implications for music cognition, performance, and affective computing.
Score: 8.932607465669195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Audio effects (FX) such as reverberation, distortion, modulation, and dynamic range processing play a pivotal role in shaping emotional responses during music listening. While prior studies have examined links between low-level audio features and affective perception, the systematic impact of audio FX on emotion remains underexplored. This work investigates how foundation models - large-scale neural architectures pretrained on multimodal data - can be leveraged to analyze these effects. Such models encode rich associations between musical structure, timbre, and affective meaning, offering a powerful framework for probing the emotional consequences of sound design techniques. By applying various probing methods to embeddings from deep learning models, we examine the complex, nonlinear relationships between audio FX and estimated emotion, uncovering patterns tied to specific effects and evaluating the robustness of foundation audio models. Our findings aim to advance understanding of the perceptual impact of audio production practices, with implications for music cognition, performance, and affective computing.

Related papers

BREATH: A Bio-Radar Embodied Agent for Tonal and Human-Aware Diffusion Music Generation [3.2646887494398205]
We present a multimodal system for personalized music generation that integrates physiological sensing, LLM-based reasoning, and controllable audio synthesis.<n>A millimeter-wave radar sensor non-invasively captures heart rate and respiration rate.<n>These physiological signals are interpreted by a reasoning agent to infer symbolic musical descriptors, such as tempo, mood intensity, and traditional Chinese pentatonic modes.
arXiv Detail & Related papers (2025-09-09T12:26:20Z)
Evaluating Fake Music Detection Performance Under Audio Augmentations [0.0]
We construct a dataset consisting of both real and synthetic music generated using several systems.<n>We then apply a range of audio transformations and analyze how they affect classification accuracy.<n>We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations.
arXiv Detail & Related papers (2025-07-07T16:15:02Z)
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation [36.11407262818182]
State-of-the-art (SOTA) systems differ significantly in many dimensions, such as training datasets, modeling paradigms, and architectural choices.<n>This study focuses exclusively on the modeling paradigm.<n>We compare the two arguably most common modeling paradigms: auto-regressive decoding and conditional flow-matching.
arXiv Detail & Related papers (2025-06-10T08:37:45Z)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs [47.325269852330884]
We develop a strategy to enhance emotion recognition by producing semantically aligned, evidence-grounded explanations.<n>We introduce a unified framework combining reasoning-augmented data supervision, dual-encoder architecture, and task-alternating training.<n> Experiments on IEMOCAP and MELD show that our approach not only improves emotion prediction accuracy but also enhances the coherence and evidential grounding of the generated responses.
arXiv Detail & Related papers (2025-06-07T14:52:58Z)
Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects [4.202522944371801]
We investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models.<n>By applying parameterized audio effects, we analyze the correlation between the deformation trajectories and the effect strength in the embedding space.<n>We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional.
arXiv Detail & Related papers (2025-01-27T09:49:08Z)
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos [78.49864987061689]
Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. Existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds. We propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip.
arXiv Detail & Related papers (2023-03-29T17:59:53Z)
End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder. We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z)
Enhancing Affective Representations of Music-Induced EEG through Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space. We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities. The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z)
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z)
Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Our idea is to learn to dereverberate speech from audio-visual observations. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z)
Investigations on Audiovisual Emotion Recognition in Noisy Conditions [43.40644186593322]
We present an investigation on two emotion datasets with superimposed noise at different signal-to-noise ratios. The results show a significant performance decrease when a model trained on clean audio is applied to noisy data.
arXiv Detail & Related papers (2021-03-02T17:45:16Z)
Audio Impairment Recognition Using a Correlation-Based Feature Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.