Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
- URL: http://arxiv.org/abs/2502.17527v1
- Date: Mon, 24 Feb 2025 07:58:10 GMT
- Title: Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
- Authors: Clémentine Berger, Roland Badeau, Slim Essid,
- Abstract summary: People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds.<n>We propose a neural network based on a psychoacoustic masking model to enhance the music's ability to mask ambient noise.<n>We evaluate our approach on simulated data replicating a user's experience of listening to music with headphones in a noisy environment.
- Score: 8.560397278656646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds. Indeed, a music signal can mask some of the noise's frequency components due to the effect of simultaneous masking. In this article, we propose a neural network based on a psychoacoustic masking model, designed to enhance the music's ability to mask ambient noise by reshaping its spectral envelope with predicted filter frequency responses. The model is trained with a perceptual loss function that balances two constraints: effectively masking the noise while preserving the original music mix and the user's chosen listening level. We evaluate our approach on simulated data replicating a user's experience of listening to music with headphones in a noisy environment. The results, based on defined objective metrics, demonstrate that our system improves the state of the art.
Related papers
- SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding [51.311553815466446]
We introduce SoundVista, a method to generate the ambient sound of an arbitrary scene at novel viewpoints.
Given a pre-acquired recording of the scene from sparsely distributed microphones, SoundVista can synthesize the sound of that scene from an unseen target viewpoint.
arXiv Detail & Related papers (2025-04-08T00:22:16Z) - BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds [16.0759003139539]
BGM2Pose is a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals.
Our method utilizes natural music that causes minimal discomfort to humans.
arXiv Detail & Related papers (2025-03-01T07:32:19Z) - SOAF: Scene Occlusion-aware Neural Acoustic Field [9.651041527067907]
We propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation.
Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling.
We extract features from local acoustic field centred around the receiver using a Fibonacci Sphere to generate audio for novel views.
arXiv Detail & Related papers (2024-07-02T13:40:56Z) - Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos [87.32349247938136]
Existing approaches implicitly assume total correspondence between the video and audio during training.
We propose a novel ambient-aware audio generation model, AV-LDM.
Our approach is the first to focus video-to-audio generation faithfully on the observed visual content.
arXiv Detail & Related papers (2024-06-13T16:10:19Z) - Music Augmentation and Denoising For Peak-Based Audio Fingerprinting [0.0]
We introduce and release a new audio augmentation pipeline that adds noise to music snippets in a realistic way.
We then propose and release a deep learning model that removes noisy components from spectrograms.
We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.
arXiv Detail & Related papers (2023-10-20T09:56:22Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Audio Denoising for Robust Audio Fingerprinting [0.0]
Music discovery services let users identify songs from short mobile recordings.
These solutions rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions.
Few works have been done to study the robustness of these algorithms to background noise captured in real environments.
arXiv Detail & Related papers (2022-12-21T09:46:12Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Visual Sound Localization in the Wild by Cross-Modal Interference
Erasing [90.21476231683008]
In real-world scenarios, audios are usually contaminated by off-screen sound and background noise.
We propose the Interference Eraser (IEr) framework, which tackles the problem of audio-visual sound source localization in the wild.
arXiv Detail & Related papers (2022-02-13T21:06:19Z) - Weakly-supervised Audio-visual Sound Source Detection and Separation [38.52168086518221]
We propose an audio-visual co-segmentation, where the network learns both what individual objects look and sound like.
We introduce weakly-supervised object segmentation in the context of sound separation.
Our architecture can be learned in an end-to-end manner and requires no additional supervision or bounding box proposals.
arXiv Detail & Related papers (2021-03-25T10:17:55Z) - Learning to Denoise Historical Music [30.165194151843835]
We propose an audio-to-audio neural network model that learns to denoise old music recordings.
The network is trained with both reconstruction and adversarial objectives on a noisy music dataset.
Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.
arXiv Detail & Related papers (2020-08-05T10:05:44Z) - Generating Visually Aligned Sound from Videos [83.89485254543888]
We focus on the task of generating sound from natural videos.
The sound should be both temporally and content-wise aligned with visual signals.
Some sounds generated outside of a camera can not be inferred from video content.
arXiv Detail & Related papers (2020-07-14T07:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.