Related papers: From Bands to Depth: Understanding Bathymetry Decisions on Sentinel-2

From Bands to Depth: Understanding Bathymetry Decisions on Sentinel-2

URL: http://arxiv.org/abs/2601.12636v1
Date: Mon, 19 Jan 2026 00:52:22 GMT
Title: From Bands to Depth: Understanding Bathymetry Decisions on Sentinel-2
Authors: Satyaki Roy Chowdhury, Aswathnarayan Radhakrishnan, Hsiao Jou Hsu, Hari Subramoni, Joachim Moortgat,
Abstract summary: We analyze a Swin-Transformer based U-Net model (Swin-BathyUNet) to understand how it infers depth and when its predictions are trustworthy.
Score: 0.23488056916440855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying Sentinel-2 satellite derived bathymetry (SDB) robustly across sites remains challenging. We analyze a Swin-Transformer based U-Net model (Swin-BathyUNet) to understand how it infers depth and when its predictions are trustworthy. A leave-one-band out study ranks spectral importance to the different bands consistent with shallow water optics. We adapt ablation-based CAM to regression (A-CAM-R) and validate the reliability via a performance retention test: keeping only the top-p% salient pixels while neutralizing the rest causes large, monotonic RMSE increase, indicating explanations localize on evidence the model relies on. Attention ablations show decoder conditioned cross attention on skips is an effective upgrade, improving robustness to glint/foam. Cross-region inference (train on one site, test on another) reveals depth-dependent degradation: MAE rises nearly linearly with depth, and bimodal depth distributions exacerbate mid/deep errors. Practical guidance follows: maintain wide receptive fields, preserve radiometric fidelity in green/blue channels, pre-filter bright high variance near shore, and pair light target site fine tuning with depth aware calibration to transfer across regions.

Related papers

Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection [0.0]
We evaluate how depth backbones and feature engineering affect a monocular Pseudo-LiDAR pipeline on the KITTI validation split.<n>Under an off-the-shelf LiDAR detector, depth-backbone choice and geometric fidelity dominate performance, outweighing secondary feature injection.
arXiv Detail & Related papers (2026-01-07T05:57:19Z)
Data-Driven Reconstruction of Significant Wave Heights from Sparse Observations [3.356199201143573]
We introduce AUWave, a hybrid deep learning framework that fuses a station-wise sequence encoder (MLP) with a multi-scale U-Net.<n>We show that AUWave consistently outperforms a representative baseline in data-richer configurations.<n>The architecture's multi-scale and attention components translate into accuracy gains when minimal but non-trivial spatial anchoring is available.
arXiv Detail & Related papers (2025-09-21T14:12:28Z)
VistaDepth: Frequency Modulation with Bias Reweighting for Enhanced Far-range Depth Estimation [13.13321690410482]
VistaDepth is a novel framework named for its ability to accurately reconstruct far-range vistas.<n>We introduce BiasMap, a mechanism that applies adaptive weights directly to the diffusion loss in the latent space.<n>Experiments show that VistaDepth achieves state-of-the-art performance for diffusion-based MDE.
arXiv Detail & Related papers (2025-04-21T13:30:51Z)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z)
Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z)
Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images. It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps. Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z)
URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation [24.03121823263355]
We introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator. Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other. We propose a surprisingly simple yet highly effective data augmentation technique CutFlip, which enforces the model to exploit more valuable clues apart from the vertical image position for depth inference.
arXiv Detail & Related papers (2023-02-16T08:53:08Z)
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation [42.19770683222846]
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. In this paper we propose to learn to detect the location of depth edges from densely-supervised synthetic data. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
arXiv Detail & Related papers (2022-12-10T14:49:24Z)
On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper. We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)
Boundary-induced and scene-aggregated network for monocular depth prediction [20.358133522462513]
We propose the Boundary-induced and Scene-aggregated network (BS-Net) to predict the dense depth of a single RGB image. Several experimental results on the NYUD v2 dataset and xffthe iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach.
arXiv Detail & Related papers (2021-02-26T01:43:17Z)
Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy. These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem. We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z)
Direct Depth Learning Network for Stereo Matching [79.3665881702387]
A novel Direct Depth Learning Network (DDL-Net) is designed for stereo matching. DDL-Net consists of two stages: the Coarse Depth Estimation stage and the Adaptive-Grained Depth Refinement stage. We show that DDL-Net achieves an average improvement of 25% on the SceneFlow dataset and $12%$ on the DrivingStereo dataset.
arXiv Detail & Related papers (2020-12-10T10:33:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.