Content-aware Warping for View Synthesis
- URL: http://arxiv.org/abs/2201.09023v1
- Date: Sat, 22 Jan 2022 11:35:05 GMT
- Title: Content-aware Warping for View Synthesis
- Authors: Mantang Guo, Jing Jin, Hui Liu, Junhui Hou, Huanqiang Zeng, Jiwen Lu
- Abstract summary: We propose content-aware warping, which adaptively learns the weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.
Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from two source views.
Experimental results on structured light field datasets with wide baselines and unstructured multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually.
- Score: 110.54435867693203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing image-based rendering methods usually adopt depth-based image
warping operation to synthesize novel views. In this paper, we reason the
essential limitations of the traditional warping operation to be the limited
neighborhood and only distance-based interpolation weights. To this end, we
propose content-aware warping, which adaptively learns the interpolation
weights for pixels of a relatively large neighborhood from their contextual
information via a lightweight neural network. Based on this learnable warping
module, we propose a new end-to-end learning-based framework for novel view
synthesis from two input source views, in which two additional modules, namely
confidence-based blending and feature-assistant spatial refinement, are
naturally proposed to handle the occlusion issue and capture the spatial
correlation among pixels of the synthesized view, respectively. Besides, we
also propose a weight-smoothness loss term to regularize the network.
Experimental results on structured light field datasets with wide baselines and
unstructured multi-view datasets show that the proposed method significantly
outperforms state-of-the-art methods both quantitatively and visually. The
source code will be publicly available at https://github.com/MantangGuo/CW4VS.
Related papers
- DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Explicit Correspondence Matching for Generalizable Neural Radiance
Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.
Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z) - USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks [31.600708674008384]
USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
arXiv Detail & Related papers (2022-07-15T13:25:47Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Learning Dynamic Interpolation for Extremely Sparse Light Fields with
Wide Baselines [42.59723383219793]
We propose a learnable model, namely dynamic reconstruction, to replace the commonly-used geometry warping operation.
Experiments show that the reconstructed LF weights achieve much higher PSNR/SSIM and preserve the LF parallax structure better than state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T02:20:03Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Bridging the Visual Gap: Wide-Range Image Blending [16.464837892640812]
We introduce an effective deep-learning model to realize wide-range image blending.
We experimentally demonstrate that our proposed method is able to produce visually appealing results.
arXiv Detail & Related papers (2021-03-28T15:07:45Z) - Light Field View Synthesis via Aperture Disparity and Warping Confidence
Map [47.046276641506786]
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images.
A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path.
arXiv Detail & Related papers (2020-09-07T09:46:01Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.