Related papers: XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method

XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method

URL: http://arxiv.org/abs/2510.07856v1
Date: Thu, 09 Oct 2025 06:58:03 GMT
Title: XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method
Authors: Haochen Yu, Qiankun Liu, Hongyuan Liu, Jianfei Jiang, Juntao Lyu, Jiansheng Chen, Huimin Ma,
Abstract summary: We propose textbfXYZ Cylinder, a feedforward model based on a unified cylinder lifting method.<n>Specifically, we design a Unified Cylinder Camera Modeling (UCCM) strategy, which avoids the learning of viewpoint-dependent spatial correspondence.<n>To improve the reconstruction accuracy, we propose a hybrid representation with several dedicated modules based on newly designed Cylinder Plane Feature Group.
Score: 27.213339282749885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, more attention has been paid to feedforward reconstruction paradigms, which mainly learn a fixed view transformation implicitly and reconstruct the scene with a single representation. However, their generalization capability and reconstruction accuracy are still limited while reconstructing driving scenes, which results from two aspects: (1) The fixed view transformation fails when the camera configuration changes, limiting the generalization capability across different driving scenes equipped with different camera configurations. (2) The small overlapping regions between sparse views of the $360^\circ$ panorama and the complexity of driving scenes increase the learning difficulty, reducing the reconstruction accuracy. To handle these difficulties, we propose \textbf{XYZCylinder}, a feedforward model based on a unified cylinder lifting method which involves camera modeling and feature lifting. Specifically, to improve the generalization capability, we design a Unified Cylinder Camera Modeling (UCCM) strategy, which avoids the learning of viewpoint-dependent spatial correspondence and unifies different camera configurations with adjustable parameters. To improve the reconstruction accuracy, we propose a hybrid representation with several dedicated modules based on newly designed Cylinder Plane Feature Group (CPFG) to lift 2D image features to 3D space. Experimental results show that XYZCylinder achieves state-of-the-art performance under different evaluation settings, and can be generalized to other driving scenes in a zero-shot manner. Project page: \href{https://yuyuyu223.github.io/XYZCYlinder-projectpage/}{here}.

Related papers

FlexMap: Generalized HD Map Construction from Flexible Camera Configurations [29.3161377210518]
High-definition (HD) maps provide essential semantic information of road structures for autonomous driving systems.<n>Current HD map construction methods require calibrated multi-camera setups and implicit or explicit 2D-to-BEV transformations.<n>We introduce FlexMap, unlike prior methods that are fixed to a specific N-camera rig, our approach adapts to variable camera configurations without any architectural changes.
arXiv Detail & Related papers (2026-01-29T22:41:11Z)
ViewMorpher3D: A 3D-aware Diffusion Framework for Multi-Camera Novel View Synthesis in Autonomous Driving [20.935790354765604]
We introduce ViewMorpher3D, a multi-view image enhancement framework based on image diffusion models.<n>Unlike single-view approaches, ViewMorpher3D jointly processes a set of rendered views conditioned on camera poses, 3D geometric priors, and temporally adjacent or spatially overlapping reference views.<n>Our framework accommodates variable numbers of cameras and flexible reference/target view configurations, making it adaptable to diverse sensor setups.
arXiv Detail & Related papers (2026-01-12T13:44:14Z)
ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation [38.23100905961028]
ReCamDriving is a vision-based, camera-controlled novel-trajectory video generation framework.<n>We present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns.
arXiv Detail & Related papers (2025-12-03T09:55:25Z)
MapAnything: Universal Feed-Forward Metric 3D Reconstruction [63.79151976126576]
MapAnything ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions.<n>It then directly regresses the metric 3D scene geometry and cameras.<n>MapAnything addresses a broad range of 3D vision tasks in a single feed-forward pass.
arXiv Detail & Related papers (2025-09-16T18:00:14Z)
GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z)
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations [44.53106180688135]
This work takes on the challenge of reconstructing 3D scenes from sparse or single-view inputs.<n>We introduce SpatialCrafter, a framework that leverages the rich knowledge in video diffusion models to generate plausible additional observations.<n>Through a trainable camera encoder and an epipolar attention mechanism for explicit geometric constraints, we achieve precise camera control and 3D consistency.
arXiv Detail & Related papers (2025-05-17T13:05:13Z)
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.<n>We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.<n>Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)
Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera. The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video [90.93141123721713]
Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera.
arXiv Detail & Related papers (2020-05-07T10:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.