Hybrid Y-Net Architecture for Singing Voice Separation
- URL: http://arxiv.org/abs/2303.02599v1
- Date: Sun, 5 Mar 2023 07:54:49 GMT
- Title: Hybrid Y-Net Architecture for Singing Voice Separation
- Authors: Rashen Fernando, Pamudu Ranasinghe, Udula Ranasinghe, Janaka
Wijayakulasooriya, Pantaleon Perera
- Abstract summary: The proposed architecture performs end-to-end hybrid source separation by extracting features from both spectrogram and waveform domains.
Inspired by the U-Net architecture, Y-Net predicts a spectrogram mask to separate vocal sources from a mixture signal.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research paper presents a novel deep learning-based neural network
architecture, named Y-Net, for achieving music source separation. The proposed
architecture performs end-to-end hybrid source separation by extracting
features from both spectrogram and waveform domains. Inspired by the U-Net
architecture, Y-Net predicts a spectrogram mask to separate vocal sources from
a mixture signal. Our results demonstrate the effectiveness of the proposed
architecture for music source separation with fewer parameters. Overall, our
work presents a promising approach for improving the accuracy and efficiency of
music source separation.
Related papers
- High-Quality Visually-Guided Sound Separation from Diverse Categories [56.92841782969847]
DAVIS is a Diffusion-based Audio-VIsual Separation framework.
It synthesizes separated sounds directly from Gaussian noise, conditioned on both the audio mixture and the visual information.
We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the AVE and MUSIC datasets.
arXiv Detail & Related papers (2023-07-31T19:41:49Z) - Visually-Guided Sound Source Separation with Audio-Visual Predictive
Coding [57.08832099075793]
Visually-guided sound source separation consists of three parts: visual feature extraction, multimodal feature fusion, and sound signal processing.
This paper presents audio-visual predictive coding (AVPC) to tackle this task in parameter harmonizing and more effective manner.
In addition, we develop a valid self-supervised learning strategy for AVPC via co-predicting two audio-visual representations of the same sound source.
arXiv Detail & Related papers (2023-06-19T03:10:57Z) - AudioSlots: A slot-centric generative model for audio separation [26.51135156983783]
We present AudioSlots, a slot-centric generative model for blind source separation in the audio domain.
We train the model in an end-to-end manner using a permutation-equivariant loss function.
Our results on Libri2Mix speech separation constitute a proof of concept that this approach shows promise.
arXiv Detail & Related papers (2023-05-09T16:28:07Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments.
We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs.
Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z) - A Unified Model for Zero-shot Music Source Separation, Transcription and
Synthesis [13.263771543118994]
We propose a unified model for three inter-related tasks: 1) to textitseparate individual sound sources from a mixed music audio, 2) to textittranscribe each sound source to MIDI notes, and 3) totextit synthesize new pieces based on the timbre of separated sources.
The model is inspired by the fact that when humans listen to music, our minds can not only separate the sounds of different instruments, but also at the same time perceive high-level representations such as score and timbre.
arXiv Detail & Related papers (2021-08-07T14:28:21Z) - Source Separation and Depthwise Separable Convolutions for Computer
Audition [0.0]
We train a depthwise separable convolutional neural network on a challenging electronic dance music data set.
It is shown that source separation improves classification performance in a limited-data setting compared to the standard single spectrogram approach.
arXiv Detail & Related papers (2020-12-06T19:30:26Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform [34.05660769694652]
We propose a time-domain audio source separation method based on a discrete wavelet transform (DWT)
The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net.
arXiv Detail & Related papers (2020-01-28T06:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.