Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications
- URL: http://arxiv.org/abs/2512.04303v1
- Date: Wed, 03 Dec 2025 22:37:38 GMT
- Title: Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications
- Authors: Gasser Elazab, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich,
- Abstract summary: We introduce Gamma-from-Mono (GfM), a lightweight monocular geometry estimation method.<n>GfM predicts a dominant road surface plane together with residual variations expressed by gamma.<n>With only the camera's height above ground, GfM deterministically recovers metric depth via a closed form.
- Score: 2.9457242478147503
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gamma-from-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by gamma, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point's height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera's height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibration and naturally prioritizing near-road detail. Its physically interpretable formulation makes it well suited for self-supervised learning, eliminating the need for large annotated datasets. Evaluated on KITTI and the Road Surface Reconstruction Dataset (RSRD), GfM achieves state-of-the-art near-field accuracy in both depth and gamma estimation while maintaining competitive global depth performance. Our lightweight 8.88M-parameter model adapts robustly across diverse camera setups and, to our knowledge, is the first self-supervised monocular approach evaluated on RSRD.
Related papers
- MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping [52.99503784067417]
We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS)<n>A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views.<n>Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions.
arXiv Detail & Related papers (2025-09-17T17:27:53Z) - Gaussian Alignment for Relative Camera Pose Estimation via Single-View Reconstruction [18.936573991468926]
GARPS is a training-free framework that casts this problem as the direct alignment of two independently reconstructed 3D scenes.<n>It refines an initial pose from a feed-forward two-view pose estimator by optimising a differentiable GMM alignment objective.<n>Experiments on the Real-Estate10K dataset demonstrate that GARPS outperforms both classical and state-of-the-art learning-based methods.
arXiv Detail & Related papers (2025-09-17T02:57:34Z) - Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching [80.57282092735991]
We propose an accurate and interpretable fine-grained cross-view localization method.<n>It estimates the 3 Degrees of Freedom (DoF) pose of a ground-level image by matching its local features with a reference aerial image.<n> Experiments show state-of-the-art accuracy in challenging scenarios such as cross-area testing and unknown orientation.
arXiv Detail & Related papers (2025-09-11T18:52:16Z) - Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation [6.996435353737172]
This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU)<n>We propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via a sparse 3D feature map.<n>We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.
arXiv Detail & Related papers (2025-09-09T21:39:13Z) - DashCam Video: A complementary low-cost data stream for on-demand forest-infrastructure system monitoring [1.6064410860203764]
This study introduces a novel, low-cost, and reproducible framework for real-time, object-level structural assessment and geolocation of roadside vegetation and infrastructure.<n>We developed an end-to-end pipeline that combines monocular depth estimation, depth error correction, and geometric triangulation to generate accurate spatial and structural data from vehicle-mounted dashcams.<n>Our approach complements conventional RS methods, such as LiDAR and image by offering a fast, real-time, and cost-effective solution for object-level monitoring of vegetation risks and infrastructure exposure.
arXiv Detail & Related papers (2025-08-15T16:55:12Z) - TS-SatMVSNet: Slope Aware Height Estimation for Large-Scale Earth Terrain Multi-view Stereo [19.509863059288037]
3D terrain reconstruction with remote sensing imagery achieves cost-effective and large-scale earth observation.<n>We propose an end-to-end slope-aware height estimation network named TS-SatMVSNet for large-scale remote sensing terrain reconstruction.<n>To fully integrate slope information into the MVS pipeline, we design two slope-guided modules to enhance reconstruction outcomes at both micro and macro levels.
arXiv Detail & Related papers (2025-01-02T04:18:40Z) - GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion [0.0]
Generalizing metric monocular depth estimation presents a significant challenge due to its ill-posed nature.<n>We propose a novel canonical representation that maintains consistency across varied camera setups.<n>We also propose a novel architecture that adaptively and probabilistically fuses depths estimated via object size and vertical image position cues.
arXiv Detail & Related papers (2024-12-08T22:04:34Z) - MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [86.87464903285208]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.<n>To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.<n>Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z) - W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration [57.37135310143126]
Previous methods for 3D motion recovery from monocular images often fall short due to reliance on camera coordinates.
We introduce W-HMR, a weak-supervised calibration method that predicts "reasonable" focal lengths based on body distortion information.
We also present the OrientCorrect module, which corrects body orientation for plausible reconstructions in world space.
arXiv Detail & Related papers (2023-11-29T09:02:07Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.