LoLep: Single-View View Synthesis with Locally-Learned Planes and
Self-Attention Occlusion Inference
- URL: http://arxiv.org/abs/2307.12217v2
- Date: Wed, 9 Aug 2023 10:34:43 GMT
- Title: LoLep: Single-View View Synthesis with Locally-Learned Planes and
Self-Attention Occlusion Inference
- Authors: Cong Wang, Yu-Ping Wang, Dinesh Manocha
- Abstract summary: We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately.
Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%.
- Score: 66.45326873274908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel method, LoLep, which regresses Locally-Learned planes from
a single RGB image to represent scenes accurately, thus generating better novel
views. Without the depth information, regressing appropriate plane locations is
a challenging problem. To solve this issue, we pre-partition the disparity
space into bins and design a disparity sampler to regress local offsets for
multiple planes in each bin. However, only using such a sampler makes the
network not convergent; we further propose two optimizing strategies that
combine with different disparity distributions of datasets and propose an
occlusion-aware reprojection loss as a simple yet effective geometric
supervision technique. We also introduce a self-attention mechanism to improve
occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module
to address the problem of applying self-attention to large feature maps. We
demonstrate the effectiveness of our approach and generate state-of-the-art
results on different datasets. Compared to MINE, our approach has an LPIPS
reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%. We also evaluate the
performance on real-world images and demonstrate the benefits.
Related papers
- Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix.
A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance.
We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - Domain Reduction Strategy for Non Line of Sight Imaging [20.473142941237015]
In non-line-of-sight (NLOS) imaging, the visible surfaces of the target objects are notably sparse.
We design our method to render the transients through partial propagations from a continuously sampled set of points from the hidden space.
Our method is capable of accurately and efficiently modeling the view-dependent reflectance using surface normals.
arXiv Detail & Related papers (2023-08-20T14:00:33Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction [11.215334675788952]
This paper presents a semantic planar SLAM system that improves pose estimation and mapping using cues from an instance planar segmentation network.
While the mainstream approaches are using RGB-D sensors, employing a monocular camera with such a system still faces challenges such as robust data association and precise geometric model fitting.
arXiv Detail & Related papers (2021-08-09T18:16:08Z) - Towards Overcoming False Positives in Visual Relationship Detection [95.15011997876606]
We investigate the cause of the high false positive rate in Visual Relationship Detection (VRD)
This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA) as a robust VRD framework that alleviates the influence of false positives.
arXiv Detail & Related papers (2020-12-23T06:28:00Z) - SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine
Reconstruction with Self-Projection Optimization [52.20602782690776]
It is expensive and tedious to obtain large scale paired sparse-canned point sets for training from real scanned sparse data.
We propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface.
We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performance to the state-of-the-art supervised methods.
arXiv Detail & Related papers (2020-12-08T14:14:09Z) - Insights on Evaluation of Camera Re-localization Using Relative Pose
Regression [0.9236074230806579]
We consider the problem of relative pose regression in visual relocalization.
We propose three new metrics to remedy the issue mentioned above.
We show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes.
arXiv Detail & Related papers (2020-09-23T19:16:26Z) - Robust Locality-Aware Regression for Labeled Data Classification [5.432221650286726]
We propose a new discriminant feature extraction framework, namely Robust Locality-Aware Regression (RLAR)
In our model, we introduce a retargeted regression to perform the marginal representation learning adaptively instead of using the general average inter-class margin.
To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by the L2,1 norm.
arXiv Detail & Related papers (2020-06-15T11:36:59Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.