ORStereo: Occlusion-Aware Recurrent Stereo Matching for 4K-Resolution
Images
- URL: http://arxiv.org/abs/2103.07798v1
- Date: Sat, 13 Mar 2021 21:46:06 GMT
- Title: ORStereo: Occlusion-Aware Recurrent Stereo Matching for 4K-Resolution
Images
- Authors: Yaoyu Hu, Wenshan Wang, Huai Yu, Weikun Zhen, Sebastian Scherer
- Abstract summary: We present the Occlusion-aware Recurrent binocular Stereo matching (ORStereo)
ORStereo generalizes to unseen high-resolution images with large disparity ranges by formulating the task as residual updates and refinements of an initial prediction.
We test the model's capability on both synthetic and real-world high-resolution images.
- Score: 13.508624751092654
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Stereo reconstruction models trained on small images do not generalize well
to high-resolution data. Training a model on high-resolution image size faces
difficulties of data availability and is often infeasible due to limited
computing resources. In this work, we present the Occlusion-aware Recurrent
binocular Stereo matching (ORStereo), which deals with these issues by only
training on available low disparity range stereo images. ORStereo generalizes
to unseen high-resolution images with large disparity ranges by formulating the
task as residual updates and refinements of an initial prediction. ORStereo is
trained on images with disparity ranges limited to 256 pixels, yet it can
operate 4K-resolution input with over 1000 disparities using limited GPU
memory. We test the model's capability on both synthetic and real-world
high-resolution images. Experimental results demonstrate that ORStereo achieves
comparable performance on 4K-resolution images compared to state-of-the-art
methods trained on large disparity ranges. Compared to other methods that are
only trained on low-resolution images, our method is 70% more accurate on
4K-resolution images.
Related papers
- ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Towards Efficient and Scale-Robust Ultra-High-Definition Image
Demoireing [71.62289021118983]
We present an efficient baseline model ESDNet for tackling 4K moire images, wherein we build a semantic-aligned scale-aware module to address the scale variation of moire patterns.
Our approach outperforms state-of-the-art methods by a large margin while being much more lightweight.
arXiv Detail & Related papers (2022-07-20T14:20:52Z) - Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes.
We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions.
We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z) - Image Resolution Susceptibility of Face Recognition Models [8.368543987898732]
We first analyze the impact of image resolutions on the face verification performance with a state-of-the-art face recognition model.
For images, synthetically reduced to $5, times 5, mathrmpx$ resolution, the verification performance drops from $99.23%$ increasingly down to almost $55%$.
arXiv Detail & Related papers (2021-07-08T11:30:27Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z) - Contextual Residual Aggregation for Ultra High-Resolution Image
Inpainting [12.839962012888199]
We propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents.
CRA mechanism produces high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches.
We train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality.
arXiv Detail & Related papers (2020-05-19T18:55:32Z) - PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of
Generative Models [77.32079593577821]
PULSE (Photo Upsampling via Latent Space Exploration) generates high-resolution, realistic images at resolutions previously unseen in the literature.
Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.
arXiv Detail & Related papers (2020-03-08T16:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.