Densely connected multidilated convolutional networks for dense
prediction tasks
- URL: http://arxiv.org/abs/2011.11844v2
- Date: Wed, 9 Jun 2021 00:31:49 GMT
- Title: Densely connected multidilated convolutional networks for dense
prediction tasks
- Authors: Naoya Takahashi, Yuki Mitsufuji
- Abstract summary: We propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net)
D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously.
Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over state-of-the-art methods.
- Score: 25.75557472306157
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tasks that involve high-resolution dense prediction require a modeling of
both local and global patterns in a large input field. Although the local and
global structures often depend on each other and their simultaneous modeling is
important, many convolutional neural network (CNN)-based approaches interchange
representations in different resolutions only a few times. In this paper, we
claim the importance of a dense simultaneous modeling of multiresolution
representation and propose a novel CNN architecture called densely connected
multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution
that has different dilation factors in a single layer to model different
resolutions simultaneously. By combining the multidilated convolution with the
DenseNet architecture, D3Net incorporates multiresolution learning with an
exponentially growing receptive field in almost all layers, while avoiding the
aliasing problem that occurs when we naively incorporate the dilated
convolution in DenseNet. Experiments on the image semantic segmentation task
using Cityscapes and the audio source separation task using MUSDB18 show that
the proposed method has superior performance over state-of-the-art methods.
Related papers
- One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection [71.78795573911512]
We propose textbfOneDet3D, a universal one-for-all model that addresses 3D detection across different domains.
We propose the domain-aware in scatter and context, guided by a routing mechanism, to address the data interference issue.
The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities.
arXiv Detail & Related papers (2024-11-03T14:21:56Z) - Self-Parameterization Based Multi-Resolution Mesh Convolution Networks [0.0]
This paper addresses the challenges of designing mesh convolution neural networks for 3D mesh dense prediction.
The novelty of our approach lies in two key aspects. First, we construct a multi-resolution mesh pyramid directly from the high-resolution input data.
Second, we maintain the high-resolution representation in the multi-resolution convolution network, enabling multi-scale fusions.
arXiv Detail & Related papers (2024-08-25T08:11:22Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - Improved distinct bone segmentation in upper-body CT through
multi-resolution networks [0.39583175274885335]
In distinct bone segmentation from upper body CTs a large field of view and a computationally taxing 3D architecture are required.
This leads to low-resolution results lacking detail or localisation errors due to missing spatial context.
We propose end-to-end trainable segmentation networks that combine several 3D U-Nets working at different resolutions.
arXiv Detail & Related papers (2023-01-31T14:46:16Z) - On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL)
In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh.
We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z) - D3Net: Densely connected multidilated DenseNet for music source
separation [25.75557472306157]
Music source separation involves a large input field to model a long-term dependence of an audio signal.
Previous convolutional neural network (CNN)-based approaches address the large input field modeling using sequentially down- and up-sampling feature maps or dilated convolution.
We propose a novel CNN architecture called densely connected dilated DenseNet (D3Net)
D3Net achieves state-of-the-art performance with an average signal to distortion ratio (SDR) of 6.01 dB.
arXiv Detail & Related papers (2020-10-05T01:03:08Z) - HITNet: Hierarchical Iterative Tile Refinement Network for Real-time
Stereo Matching [18.801346154045138]
HITNet is a novel neural network architecture for real-time stereo matching.
Our architecture is inherently multi-resolution allowing the propagation of information across different levels.
At the time of writing, HITNet ranks 1st-3rd on all the metrics published on the ETH3D website for two view stereo.
arXiv Detail & Related papers (2020-07-23T17:11:48Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Seismic horizon detection with neural networks [62.997667081978825]
This paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
The main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
arXiv Detail & Related papers (2020-01-10T11:30:50Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.