D3Net: Densely connected multidilated DenseNet for music source
separation
- URL: http://arxiv.org/abs/2010.01733v4
- Date: Sat, 27 Mar 2021 04:55:38 GMT
- Title: D3Net: Densely connected multidilated DenseNet for music source
separation
- Authors: Naoya Takahashi and Yuki Mitsufuji
- Abstract summary: Music source separation involves a large input field to model a long-term dependence of an audio signal.
Previous convolutional neural network (CNN)-based approaches address the large input field modeling using sequentially down- and up-sampling feature maps or dilated convolution.
We propose a novel CNN architecture called densely connected dilated DenseNet (D3Net)
D3Net achieves state-of-the-art performance with an average signal to distortion ratio (SDR) of 6.01 dB.
- Score: 25.75557472306157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music source separation involves a large input field to model a long-term
dependence of an audio signal. Previous convolutional neural network
(CNN)-based approaches address the large input field modeling using
sequentially down- and up-sampling feature maps or dilated convolution. In this
paper, we claim the importance of a rapid growth of a receptive field and a
simultaneous modeling of multi-resolution data in a single convolution layer,
and propose a novel CNN architecture called densely connected dilated DenseNet
(D3Net). D3Net involves a novel multi-dilated convolution that has different
dilation factors in a single layer to model different resolutions
simultaneously. By combining the multi-dilated convolution with DenseNet
architecture, D3Net avoids the aliasing problem that exists when we naively
incorporate the dilated convolution in DenseNet. Experimental results on
MUSDB18 dataset show that D3Net achieves state-of-the-art performance with an
average signal to distortion ratio (SDR) of 6.01 dB.
Related papers
- C3Net: Compound Conditioned ControlNet for Multimodal Content Generation [67.5090755991599]
Compound Conditioned ControlNet, C3Net, is a novel generative neural architecture taking conditions from multiple modalities simultaneously.
C3Net adapts the ControlNet architecture to jointly train and make inferences on a production-ready diffusion model.
arXiv Detail & Related papers (2023-11-29T07:11:56Z) - SODAWideNet -- Salient Object Detection with an Attention augmented Wide
Encoder Decoder network without ImageNet pre-training [3.66237529322911]
We explore developing a neural network from scratch directly trained on Salient Object Detection without ImageNet pre-training.
We propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection.
Two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), achieve competitive performance against state-of-the-art models on five datasets.
arXiv Detail & Related papers (2023-11-08T16:53:44Z) - DGCNet: An Efficient 3D-Densenet based on Dynamic Group Convolution for
Hyperspectral Remote Sensing Image Classification [22.025733502296035]
We introduce a lightweight model based on the improved 3D-Densenet model and designs DGCNet.
Multiple groups can capture different and complementary visual and semantic features of input images, allowing convolution neural network(CNN) to learn rich features.
The inference speed and accuracy have been improved, with outstanding performance on the IN, Pavia and KSC datasets.
arXiv Detail & Related papers (2023-07-13T10:19:48Z) - Improved distinct bone segmentation in upper-body CT through
multi-resolution networks [0.39583175274885335]
In distinct bone segmentation from upper body CTs a large field of view and a computationally taxing 3D architecture are required.
This leads to low-resolution results lacking detail or localisation errors due to missing spatial context.
We propose end-to-end trainable segmentation networks that combine several 3D U-Nets working at different resolutions.
arXiv Detail & Related papers (2023-01-31T14:46:16Z) - EurNet: Efficient Multi-Range Relational Modeling of Spatial
Multi-Relational Data [65.56348668962343]
We introduce the EurNet for Efficient multi-range relational modeling.
EurNet constructs the multi-relational graph, where each type of edge corresponds to short-, medium- or long-range spatial interactions.
We study EurNets in two important domains for image and protein structure modeling.
arXiv Detail & Related papers (2022-11-23T13:24:36Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z) - Densely connected multidilated convolutional networks for dense
prediction tasks [25.75557472306157]
We propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net)
D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously.
Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2020-11-21T05:15:12Z) - VolumeNet: A Lightweight Parallel Network for Super-Resolution of
Medical Volumetric Data [20.34783243852236]
We propose a 3D convolutional neural network (CNN) for SR of medical volumetric data called ParallelNet using parallel connections.
We show that the proposed VolumeNet significantly reduces the number of model parameters and achieves high precision results.
arXiv Detail & Related papers (2020-10-16T12:53:15Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.