Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
- URL: http://arxiv.org/abs/2503.20211v1
- Date: Wed, 26 Mar 2025 04:12:54 GMT
- Title: Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
- Authors: Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan,
- Abstract summary: We present the first synthetic-to-real robust depth estimation framework, incorporating motion and structure priors to capture real-world knowledge effectively.<n>We achieve improvements of 7.5% and 4.3% in AbsRel and RMSE on average for nuScenes and Robotcar datasets (daytime, nighttime, rain)<n>In zero-shot evaluation of DrivingStereo (rain, fog), our method generalizes better than the previous ones.
- Score: 22.831281986234988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised depth estimation from monocular cameras in diverse outdoor conditions, such as daytime, rain, and nighttime, is challenging due to the difficulty of learning universal representations and the severe lack of labeled real-world adverse data. Previous methods either rely on synthetic inputs and pseudo-depth labels or directly apply daytime strategies to adverse conditions, resulting in suboptimal results. In this paper, we present the first synthetic-to-real robust depth estimation framework, incorporating motion and structure priors to capture real-world knowledge effectively. In the synthetic adaptation, we transfer motion-structure knowledge inside cost volumes for better robust representation, using a frozen daytime model to train a depth estimator in synthetic adverse conditions. In the innovative real adaptation, which targets to fix synthetic-real gaps, models trained earlier identify the weather-insensitive regions with a designed consistency-reweighting strategy to emphasize valid pseudo-labels. We introduce a new regularization by gathering explicit depth distributions to constrain the model when facing real-world data. Experiments show that our method outperforms the state-of-the-art across diverse conditions in multi-frame and single-frame evaluations. We achieve improvements of 7.5% and 4.3% in AbsRel and RMSE on average for nuScenes and Robotcar datasets (daytime, nighttime, rain). In zero-shot evaluation of DrivingStereo (rain, fog), our method generalizes better than the previous ones.
Related papers
- PhysE-Inv: A Physics-Encoded Inverse Modeling approach for Arctic Snow Depth Prediction [0.16206783799607727]
We introduce PhysE-Inv, a novel framework that integrates a sophisticated sequential architecture, an LSTM-Decoder with Multi-head Attention and physics-guided contrastive learning, with physics-guided inference.<n> PhysE-Inv significantly improves prediction performance, reducing error by 20% while demonstrating superior physical consistency and resilience to data sparsity compared to empirical methods.<n>This approach pioneers a path for noise-tolerant, interpretable inverse modeling, with wide applicability in geospatial and cryospheric domains.
arXiv Detail & Related papers (2026-01-23T00:43:51Z) - BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining [2.400704807305413]
Zero-shot 3D object classification is crucial for real-world applications like autonomous driving.<n>It is often hindered by a significant domain gap between the synthetic data used for training and the sparse, noisy LiDAR scans encountered in the real-world.<n>We introduce BlendCLIP, a multimodal pretraining framework that bridges this synthetic-to-real gap by strategically combining the strengths of both domains.
arXiv Detail & Related papers (2025-10-21T03:08:27Z) - RoSe: Robust Self-supervised Stereo Matching under Adverse Weather Conditions [58.37558408672509]
We propose a robust self-supervised training paradigm, consisting of two key steps: robust self-supervised scene correspondence learning and adverse weather distillation.<n>Experiments demonstrate the effectiveness and versatility of our proposed solution, which outperforms existing state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2025-09-23T15:41:40Z) - ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation [33.632587382356824]
We propose weather-generalized depth estimation using only a small amount of high-visibility (normal) data.<n>We introduce the Selecting-Tuning-Maintaining (STM) strategy, which structurally decomposes the pretrained weights of Vision Foundation Models.<n>In the tuning phase, we adaptively select the proper rank number as well as the task-aware singular directions.<n>While in the maintaining stage, we enforce a principal direction regularization based on the stable-rank.
arXiv Detail & Related papers (2025-08-31T02:24:00Z) - RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - SSSUMO: Real-Time Semi-Supervised Submovement Decomposition [0.6499759302108926]
Submovement analysis offers valuable insights into motor control.<n>Existing methods struggle with reconstruction accuracy, computational cost, and validation.<n>We address these challenges using a semi-supervised learning framework.
arXiv Detail & Related papers (2025-07-08T21:26:25Z) - Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining [73.5575992346396]
We propose a dual-branch-temporal state-space model to enhance rain streak removal in video sequences.<n>To improve multi-frame feature fusion, we derive a dynamic filter stacking, which adaptively approximates statistical filters for pixel-wise feature refinement.<n>To further explore the capacity of deraining models in supporting other vision-based tasks in rainy environments, we introduce a novel real-world benchmark.
arXiv Detail & Related papers (2025-05-22T15:50:00Z) - DepthFM: Fast Monocular Depth Estimation with Flow Matching [22.206355073676082]
Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport.<n>Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions.<n>Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.
arXiv Detail & Related papers (2024-03-20T17:51:53Z) - Learning Robust Precipitation Forecaster by Temporal Frame Interpolation [65.5045412005064]
We develop a robust precipitation forecasting model that demonstrates resilience against spatial-temporal discrepancies.
Our approach has led to significant improvements in forecasting precision, culminating in our model securing textit1st place in the transfer learning leaderboard of the textitWeather4cast'23 competition.
arXiv Detail & Related papers (2023-11-30T08:22:08Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions [42.99525455786019]
We propose WeatherDepth, a self-supervised robust depth estimation model with curriculum contrastive learning.
The proposed solution is proven to be easily incorporated into various architectures and demonstrates state-of-the-art (SoTA) performance on both synthetic and real weather datasets.
arXiv Detail & Related papers (2023-10-09T09:26:27Z) - Robust Monocular Depth Estimation under Challenging Conditions [81.57697198031975]
State-of-the-art monocular depth estimation approaches are highly unreliable under challenging illumination and weather conditions.
We tackle these safety-critical issues with md4all: a simple and effective solution that works reliably under both adverse and ideal conditions.
arXiv Detail & Related papers (2023-08-18T17:59:01Z) - PointNorm-Net: Self-Supervised Normal Prediction of 3D Point Clouds via Multi-Modal Distribution Estimation [29.582507073730913]
PointNorm-Net is the first self-supervised deep learning framework to tackle this challenge.
Our method achieves superior generalization and outperforms state-of-the-art conventional and deep learning approaches across three real-world datasets.
arXiv Detail & Related papers (2023-04-10T22:11:13Z) - Vision in adverse weather: Augmentation using CycleGANs with various
object detectors for robust perception in autonomous racing [70.16043883381677]
In autonomous racing, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres.
In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions.
We introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors.
arXiv Detail & Related papers (2022-01-10T10:02:40Z) - Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of
Adverse Weather Conditions for 3D Object Detection [60.89616629421904]
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars.
They are sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR)
arXiv Detail & Related papers (2021-07-14T21:10:47Z) - Semi-Supervised Video Deraining with Dynamic Rain Generator [59.71640025072209]
This paper proposes a new semi-supervised video deraining method, in which a dynamic rain generator is employed to fit the rain layer.
Specifically, such dynamic generator consists of one emission model and one transition model to simultaneously encode the spatially physical structure and temporally continuous changes of rain streaks.
Various prior formats are designed for the labeled synthetic and unlabeled real data, so as to fully exploit the common knowledge underlying them.
arXiv Detail & Related papers (2021-03-14T14:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.