Related papers: Multi-modal Bifurcated Network for Depth Guided Image Relighting

Multi-modal Bifurcated Network for Depth Guided Image Relighting

URL: http://arxiv.org/abs/2105.00690v2
Date: Wed, 5 May 2021 02:13:15 GMT
Title: Multi-modal Bifurcated Network for Depth Guided Image Relighting
Authors: Hao-Hsiang Yang and Wei-Ting Chen and Hao-Lun Luo and Sy-Yen Kuo
Abstract summary: We propose a deep learning-based method called multi-modal bifurcated network (MBNet) for depth guided image relighting. This model extracts the image and the depth features by the bifurcated network in the encoder. Experiments conducted on the VIDIT dataset show that the proposed solution obtains the textbf1$st$ place in terms of SSIM and PMS.
Score: 13.857410735989301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image relighting aims to recalibrate the illumination setting in an image. In this paper, we propose a deep learning-based method called multi-modal bifurcated network (MBNet) for depth guided image relighting. That is, given an image and the corresponding depth maps, a new image with the given illuminant angle and color temperature is generated by our network. This model extracts the image and the depth features by the bifurcated network in the encoder. To use the two features effectively, we adopt the dynamic dilated pyramid modules in the decoder. Moreover, to increase the variety of training data, we propose a novel data process pipeline to increase the number of the training data. Experiments conducted on the VIDIT dataset show that the proposed solution obtains the \textbf{1}$^{st}$ place in terms of SSIM and PMS in the NTIRE 2021 Depth Guide One-to-one Relighting Challenge.

Related papers

Language-Depth Navigated Thermal and Visible Image Fusion [11.473316170288166]
Existing thermal-visible image fusion mainly focuses on detection tasks, ignoring other critical information such as depth. We introduce a text-guided and depth-driven infrared and visible image fusion network. This supports precise recognition and efficient operations in applications such as autonomous driving and rescue missions.
arXiv Detail & Related papers (2025-03-11T17:55:22Z)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation. Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model. Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks [83.08625720856445]
Deep learning tasks require annotations that are too time consuming for human operators. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets. We show that PromptMix can significantly increase the performance of lightweight networks by up to 26%.
arXiv Detail & Related papers (2023-01-30T14:15:47Z)
Single Plane-Wave Imaging using Physics-Based Deep Learning [2.1410799064827226]
In plane-wave imaging, multiple unfocused ultrasound waves are transmitted into a medium of interest from different angles. Deep learning methods have been proposed to improve ultrasound imaging. We propose a data-to-image architecture that incorporates a wave-physics-based image formation algorithm in-between deep convolutional neural networks.
arXiv Detail & Related papers (2021-09-08T14:06:29Z)
RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results. blurry image guidance and object structures in depth still impede the performance of image guided frameworks. We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values. Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z)
S3Net: A Single Stream Structure for Depth Guided Image Relighting [13.201978111555817]
We propose a deep learning-based neural Single Stream Structure network called S3Net for depth guided image relighting. Experiments performed on challenging benchmark show that the proposed model achieves the 3 rd highest SSIM in the NTIRE 2021 Depth Guided Any-to-any Relighting Challenge.
arXiv Detail & Related papers (2021-05-03T08:33:53Z)
Degrade is Upgrade: Learning Degradation for Low-light Image Enhancement [52.49231695707198]
We investigate the intrinsic degradation and relight the low-light image while refining the details and color in two steps. Inspired by the color image formulation, we first estimate the degradation from low-light inputs to simulate the distortion of environment illumination color, and then refine the content to recover the loss of diffuse illumination color. Our proposed method has surpassed the SOTA by 0.95dB in PSNR on LOL1000 dataset and 3.18% in mAP on ExDark dataset.
arXiv Detail & Related papers (2021-03-19T04:00:27Z)
Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration [77.1056200937214]
We study the formation of the DP pair which links the blur and the depth information. We propose an end-to-end DDDNet (DP-based Depth and De Network) to jointly estimate the depth and restore the image.
arXiv Detail & Related papers (2020-12-01T06:53:57Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
Learning light field synthesis with Multi-Plane Images: scene encoding as a recurrent segmentation task [30.058283056074426]
This paper addresses the problem of view synthesis from large baseline light fields by turning a sparse set of input views into a Multi-plane Image (MPI) Because available datasets are scarce, we propose a lightweight network that does not require extensive training. Our model does not learn to estimate RGB layers but only encodes the scene geometry within MPI alpha layers, which comes down to a segmentation task.
arXiv Detail & Related papers (2020-02-12T14:35:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.