Related papers: BREATH: A Bio-Radar Embodied Agent for Tonal and Human-Aware Diffusion Music Generation

BREATH: A Bio-Radar Embodied Agent for Tonal and Human-Aware Diffusion Music Generation

URL: http://arxiv.org/abs/2510.15895v1
Date: Tue, 09 Sep 2025 12:26:20 GMT
Title: BREATH: A Bio-Radar Embodied Agent for Tonal and Human-Aware Diffusion Music Generation
Authors: Yunzhe Wang, Xinyu Tang, Zhixun Huang, Xiaolong Yue, Yuxin Zeng,
Abstract summary: We present a multimodal system for personalized music generation that integrates physiological sensing, LLM-based reasoning, and controllable audio synthesis.<n>A millimeter-wave radar sensor non-invasively captures heart rate and respiration rate.<n>These physiological signals are interpreted by a reasoning agent to infer symbolic musical descriptors, such as tempo, mood intensity, and traditional Chinese pentatonic modes.
Score: 3.2646887494398205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a multimodal system for personalized music generation that integrates physiological sensing, LLM-based reasoning, and controllable audio synthesis. A millimeter-wave radar sensor non-invasively captures heart rate and respiration rate. These physiological signals, combined with environmental state, are interpreted by a reasoning agent to infer symbolic musical descriptors, such as tempo, mood intensity, and traditional Chinese pentatonic modes, which are then expressed as structured prompts to guide a diffusion-based audio model in synthesizing expressive melodies. The system emphasizes cultural grounding through tonal embeddings and enables adaptive, embodied music interaction. To evaluate the system, we adopt a research-creation methodology combining case studies, expert feedback, and targeted control experiments. Results show that physiological variations can modulate musical features in meaningful ways, and tonal conditioning enhances alignment with intended modal characteristics. Expert users reported that the system affords intuitive, culturally resonant musical responses and highlighted its potential for therapeutic and interactive applications. This work demonstrates a novel bio-musical feedback loop linking radar-based sensing, prompt reasoning, and generative audio modeling.

Related papers

The Ghost in the Keys: A Disklavier Demo for Human-AI Musical Co-Creativity [59.78509280246215]
Aria-Duet is an interactive system facilitating a real-time musical duet between a human pianist and Aria, a state-of-the-art generative model.<n>We analyze the system's output from a musicological perspective, finding the model can maintain stylistic semantics and develop coherent phrasal ideas.
arXiv Detail & Related papers (2025-11-03T15:26:01Z)
Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain [5.168772989709122]
In affective neuroscience and emotion-aware AI, understanding how complex auditory stimuli drive emotion arousal dynamics remains unresolved.<n>This study introduces a computational framework to model the brain's encoding of naturalistic auditory inputs into dynamic behavioral/neural responses.<n>By integrating affective computing and neuroscience, this work uncovers hierarchical mechanisms of auditory-emotion encoding.
arXiv Detail & Related papers (2025-09-23T14:52:11Z)
Exploring How Audio Effects Alter Emotion with Foundation Models [8.932607465669195]
Audio effects (FX) play a pivotal role in shaping emotional responses during music listening.<n>This work investigates how foundation models can be leveraged to analyze these effects.<n>Our findings aim to advance understanding of the perceptual impact of audio production practices, with implications for music cognition, performance, and affective computing.
arXiv Detail & Related papers (2025-09-18T16:57:08Z)
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z)
R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity [0.12289361708127873]
Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception.
arXiv Detail & Related papers (2024-06-21T17:11:45Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
Enhancing Affective Representations of Music-Induced EEG through Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space. We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities. The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z)
EEGminer: Discovering Interpretable Features of Brain Activity with Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module. We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size. The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.