Deeply Cascaded U-Net for Multi-Task Image Processing
- URL: http://arxiv.org/abs/2005.00225v1
- Date: Fri, 1 May 2020 05:06:35 GMT
- Title: Deeply Cascaded U-Net for Multi-Task Image Processing
- Authors: Ilja Gubins, Remco C. Veltkamp
- Abstract summary: We propose a novel multi-task neural network architecture designed for combining sequential image processing tasks.
We extend U-Net by additional decoding pathways for each individual task, and explore deep cascading of outputs and connectivity from one pathway to another.
- Score: 1.3815111881580258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In current practice, many image processing tasks are done sequentially (e.g.
denoising, dehazing, followed by semantic segmentation). In this paper, we
propose a novel multi-task neural network architecture designed for combining
sequential image processing tasks. We extend U-Net by additional decoding
pathways for each individual task, and explore deep cascading of outputs and
connectivity from one pathway to another. We demonstrate effectiveness of the
proposed approach on denoising and semantic segmentation, as well as on
progressive coarse-to-fine semantic segmentation, and achieve better
performance than multiple individual or jointly-trained networks, with lower
number of trainable parameters.
Related papers
- The revenge of BiSeNet: Efficient Multi-Task Image Segmentation [6.172605433695617]
BiSeNetFormer is a novel architecture for efficient multi-task image segmentation.
By seamlessly supporting multiple tasks, BiSeNetFormer offers a versatile solution for multi-task segmentation.
Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks.
arXiv Detail & Related papers (2024-04-15T08:32:18Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - End-To-End Data-Dependent Routing in Multi-Path Neural Networks [0.9507070656654633]
We propose the use of multi-path neural networks with data-dependent resource allocation among parallel computations within layers.
Our networks show superior performance to existing widening and adaptive feature extraction, and even ensembles, and deeper networks at similar complexity in the image recognition task.
arXiv Detail & Related papers (2021-07-06T07:58:07Z) - Encoder Fusion Network with Co-Attention Embedding for Referring Image
Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network.
A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features.
The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z) - EADNet: Efficient Asymmetric Dilated Network for Semantic Segmentation [8.449677920206817]
Experimental results on the Cityscapes dataset demonstrate that our proposed EADNet achieves segmentation mIoU of 67.1 with smallest number of parameters (only 0.35M) among mainstream lightweight semantic segmentation networks.
arXiv Detail & Related papers (2021-03-16T08:46:57Z) - Searching for Controllable Image Restoration Networks [57.23583915884236]
Existing methods require separate inference through the entire network per each output.
We propose a novel framework based on a neural architecture search technique that enables efficient generation of multiple imagery effects.
arXiv Detail & Related papers (2020-12-21T10:08:18Z) - Multi-task GANs for Semantic Segmentation and Depth Completion with
Cycle Consistency [7.273142068778457]
We propose multi-task generative adversarial networks (Multi-task GANs), which are competent in semantic segmentation and depth completion.
In this paper, we improve the details of generated semantic images based on CycleGAN by introducing multi-scale spatial pooling blocks and the structural similarity reconstruction loss.
Experiments on Cityscapes dataset and KITTI depth completion benchmark show that the Multi-task GANs are capable of achieving competitive performance.
arXiv Detail & Related papers (2020-11-29T04:12:16Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.