Related papers: PIS3R: Very Large Parallax Image Stitching via Deep 3D Reconstruction

PIS3R: Very Large Parallax Image Stitching via Deep 3D Reconstruction

URL: http://arxiv.org/abs/2508.04236v1
Date: Wed, 06 Aug 2025 09:18:45 GMT
Title: PIS3R: Very Large Parallax Image Stitching via Deep 3D Reconstruction
Authors: Muhua Zhu, Xinhao Jin, Chengbo Wang, Yongcong Zhang, Yifei Xue, Tie Ji, Yizhen Lao,
Abstract summary: Image stitching aim to align two images taken from different viewpoints into one seamless, wider image.<n>Most existing stitching methods struggle to handle such images with large parallax effectively.<n>We propose PIS3R that is robust to very large parallax based on the novel concept of deep 3D reconstruction.
Score: 5.816094524098354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image stitching aim to align two images taken from different viewpoints into one seamless, wider image. However, when the 3D scene contains depth variations and the camera baseline is significant, noticeable parallax occurs-meaning the relative positions of scene elements differ substantially between views. Most existing stitching methods struggle to handle such images with large parallax effectively. To address this challenge, in this paper, we propose an image stitching solution called PIS3R that is robust to very large parallax based on the novel concept of deep 3D reconstruction. First, we apply visual geometry grounded transformer to two input images with very large parallax to obtain both intrinsic and extrinsic parameters, as well as the dense 3D scene reconstruction. Subsequently, we reproject reconstructed dense point cloud onto a designated reference view using the recovered camera parameters, achieving pixel-wise alignment and generating an initial stitched image. Finally, to further address potential artifacts such as holes or noise in the initial stitching, we propose a point-conditioned image diffusion module to obtain the refined result.Compared with existing methods, our solution is very large parallax tolerant and also provides results that fully preserve the geometric integrity of all pixels in the 3D photogrammetric context, enabling direct applicability to downstream 3D vision tasks such as SfM. Experimental results demonstrate that the proposed algorithm provides accurate stitching results for images with very large parallax, and outperforms the existing methods qualitatively and quantitatively.

Related papers

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction [99.52487968452198]
NOVA3R is an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner.<n>It produces physically plausible geometry with fewer duplicated structures in overlapping regions.<n>It outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2026-03-04T15:36:25Z)
CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation [0.9558392439655014]
Self-supervised surround-view depth estimation enables dense, low-cost 3D perception with a 360 field of view from multiple minimally overlapping images.<n>Yet, most existing methods suffer from depth estimates that are inconsistent between overlapping images.<n>We propose a novel geometry-guided method for calibrated, time-synchronized multi-camera rigs that predicts dense, metric, and cross-view-consistent depth.
arXiv Detail & Related papers (2025-11-20T14:55:28Z)
HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.<n>We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.<n>Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)
Fine Dense Alignment of Image Bursts through Camera Pose and Depth Estimation [45.11207941777178]
This paper introduces a novel approach to the fine alignment of images in a burst captured by a handheld camera. The proposed algorithm establishes dense correspondences by optimizing both the camera motion and surface depth and orientation at every pixel.
arXiv Detail & Related papers (2023-12-08T17:22:04Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning [70.75369367311897]
3D-aware global correspondences are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies. An adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result.
arXiv Detail & Related papers (2022-11-25T12:16:21Z)
GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry [40.68659515139644]
Reference-guided image inpainting restores image pixels by leveraging the content from another reference image. We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection. Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions.
arXiv Detail & Related papers (2022-01-20T12:17:13Z)
Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods. This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting. Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object. We first estimate per-view depth maps using a deep multi-view stereo network. These depth maps are used to coarsely align the different views. We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.