Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal
Fusion with Depth Guidance
- URL: http://arxiv.org/abs/2204.05543v1
- Date: Tue, 12 Apr 2022 06:06:50 GMT
- Title: Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal
Fusion with Depth Guidance
- Authors: Lei Zhang, Kang Liao, Chunyu Lin, Yao Zhao
- Abstract summary: We propose a Depth-Guided Outpainting Network (DGONet) to model the feature representations of different modalities.
Two components are designed to implement: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from perspectives of different modal characteristics.
We specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation.
- Score: 49.94504248096527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image outpainting technology generates visually reasonable content regardless
of authenticity, making it unreliable to serve for practical applications even
though introducing additional modalities eg. the sketch. Since sparse depth
maps are widely captured in robotics and autonomous systems, together with RGB
images, we combine the sparse depth in the image outpainting task to provide
more reliable performance. Concretely, we propose a Depth-Guided Outpainting
Network (DGONet) to model the feature representations of different modalities
differentially and learn the structure-aware cross-modal fusion. To this end,
two components are designed to implement: 1) The Multimodal Learning Module
produces unique depth and RGB feature representations from the perspectives of
different modal characteristics. 2) The Depth Guidance Fusion Module leverages
the complete depth modality to guide the establishment of RGB contents by
progressive multimodal feature fusion. Furthermore, we specially design an
additional constraint strategy consisting of Cross-modal Loss and Edge Loss to
enhance ambiguous contours and expedite reliable content generation. Extensive
experiments on KITTI demonstrate our superiority over the state-of-the-art
methods with more reliable content generation.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion
Network for Depth Completion [3.8558637038709622]
We propose a new model for depth completion based on an encoder-decoder structure.
Our model introduces two key components: the Mask-adaptive Gated Convolution architecture and the Bi-directional Progressive Fusion module.
We achieve remarkable performance in completing depth maps and outperformed existing approaches in terms of accuracy and reliability.
arXiv Detail & Related papers (2024-01-15T02:58:06Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results.
blurry image guidance and object structures in depth still impede the performance of image guided frameworks.
We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values.
Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z) - Interpretable Deep Multimodal Image Super-Resolution [23.48305854574444]
Multimodal image super-resolution (SR) is the reconstruction of a high resolution image given a low-resolution observation with the aid of another image modality.
We present a multimodal deep network design that integrates coupled sparse priors and allows the effective fusion of information from another modality into the reconstruction process.
arXiv Detail & Related papers (2020-09-07T14:08:35Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Multimodal Deep Unfolding for Guided Image Super-Resolution [23.48305854574444]
Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a high-resolution output.
We propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture.
Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information.
arXiv Detail & Related papers (2020-01-21T14:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.