BNMusic: Blending Environmental Noises into Personalized Music
- URL: http://arxiv.org/abs/2506.10754v1
- Date: Thu, 12 Jun 2025 14:39:08 GMT
- Title: BNMusic: Blending Environmental Noises into Personalized Music
- Authors: Chi Zuo, Martin B. Møller, Pablo Martínez-Nuevo, Huayang Huang, Yu Wu, Ye Zhu,
- Abstract summary: We propose a Blending Noises into Personalized Music (BNMusic) framework with two key stages.<n>The first stage synthesizes a complete piece of music in a mel-spectrogram representation that encapsulates the musical essence of the noise.<n>In the second stage, we adaptively amplify the generated music segment to further reduce noise perception and enhance the blending effectiveness.
- Score: 11.253264308431953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other dominant yet less intrusive sounds. However, misalignment between the dominant sound and the noise-such as mismatched downbeats-often requires an excessive volume increase to achieve effective masking. Motivated by recent advances in cross-modal generation, in this work, we introduce an alternative method to acoustic masking, aiming to reduce the noticeability of environmental noises by blending them into personalized music generated based on user-provided text prompts. Following the paradigm of music generation using mel-spectrogram representations, we propose a Blending Noises into Personalized Music (BNMusic) framework with two key stages. The first stage synthesizes a complete piece of music in a mel-spectrogram representation that encapsulates the musical essence of the noise. In the second stage, we adaptively amplify the generated music segment to further reduce noise perception and enhance the blending effectiveness, while preserving auditory quality. Our experiments with comprehensive evaluations on MusicBench, EPIC-SOUNDS, and ESC-50 demonstrate the effectiveness of our framework, highlighting the ability to blend environmental noise with rhythmically aligned, adaptively amplified, and enjoyable music segments, minimizing the noticeability of the noise, thereby improving overall acoustic experiences.
Related papers
- NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration [25.13251765490759]
We propose Multi-Level Noise DeController, Multi-Frame Noise DeController, and Joint Denoising to enhance consistencies in video generation.<n>We evaluate our NoiseController on public datasets on video generation and downstream tasks, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2025-04-25T16:01:48Z) - SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding [51.311553815466446]
We introduce SoundVista, a method to generate the ambient sound of an arbitrary scene at novel viewpoints.<n>Given a pre-acquired recording of the scene from sparsely distributed microphones, SoundVista can synthesize the sound of that scene from an unseen target viewpoint.
arXiv Detail & Related papers (2025-04-08T00:22:16Z) - Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping [8.560397278656646]
People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds.<n>We propose a neural network based on a psychoacoustic masking model to enhance the music's ability to mask ambient noise.<n>We evaluate our approach on simulated data replicating a user's experience of listening to music with headphones in a noisy environment.
arXiv Detail & Related papers (2025-02-24T07:58:10Z) - Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals [31.30005077444649]
This paper proposes a new end-to-end noise-immune synthesis framework from microphone-array signals, abbreviated as Array2BR.
Compared with existing methods, the proposed method achieved better performance in terms of both objective and subjective metric scores.
arXiv Detail & Related papers (2024-10-08T06:55:35Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Music Auto-Tagging with Robust Music Representation Learned via Domain
Adversarial Training [18.71152526968065]
Existing models in Music Information Retrieval (MIR) struggle with real-world noise such as environmental and speech sounds in multimedia content.
This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings.
arXiv Detail & Related papers (2024-01-27T06:56:51Z) - Exploiting Time-Frequency Conformers for Music Audio Enhancement [21.243039524049614]
We propose a music enhancement system based on the Conformer architecture.
Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task.
arXiv Detail & Related papers (2023-08-24T06:56:54Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Visual Acoustic Matching [92.91522122739845]
We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.
Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials.
arXiv Detail & Related papers (2022-02-14T17:05:22Z) - Towards Noise-resistant Object Detection with Noisy Annotations [119.63458519946691]
Training deep object detectors requires significant amount of human-annotated images with accurate object labels and bounding box coordinates.
Noisy annotations are much more easily accessible, but they could be detrimental for learning.
We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.
arXiv Detail & Related papers (2020-03-03T01:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.