Related papers: DepthSeg: Depth prompting in remote sensing semantic segmentation

DepthSeg: Depth prompting in remote sensing semantic segmentation

URL: http://arxiv.org/abs/2506.14382v1
Date: Tue, 17 Jun 2025 10:27:59 GMT
Title: DepthSeg: Depth prompting in remote sensing semantic segmentation
Authors: Ning Zhou, Shanxiong Chen, Mingting Zhou, Haigang Sui, Lieyun Hu, Han Li, Li Hua, Qiming Zhou,
Abstract summary: In this paper, we introduce a depth prompting two-dimensional (2D) remote sensing semantic segmentation framework (DepthSeg)<n>It automatically models depth/height information from 2D remote sensing images and integrates it into the semantic segmentation framework.<n>Experiments on the LiuZhou dataset validate the advantages of the DepthSeg framework in land cover mapping tasks.
Score: 16.93010831616395
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Remote sensing semantic segmentation is crucial for extracting detailed land surface information, enabling applications such as environmental monitoring, land use planning, and resource assessment. In recent years, advancements in artificial intelligence have spurred the development of automatic remote sensing semantic segmentation methods. However, the existing semantic segmentation methods focus on distinguishing spectral characteristics of different objects while ignoring the differences in the elevation of the different targets. This results in land cover misclassification in complex scenarios involving shadow occlusion and spectral confusion. In this paper, we introduce a depth prompting two-dimensional (2D) remote sensing semantic segmentation framework (DepthSeg). It automatically models depth/height information from 2D remote sensing images and integrates it into the semantic segmentation framework to mitigate the effects of spectral confusion and shadow occlusion. During the feature extraction phase of DepthSeg, we introduce a lightweight adapter to enable cost-effective fine-tuning of the large-parameter vision transformer encoder pre-trained by natural images. In the depth prompting phase, we propose a depth prompter to model depth/height features explicitly. In the semantic prediction phase, we introduce a semantic classification decoder that couples the depth prompts with high-dimensional land-cover features, enabling accurate extraction of land-cover types. Experiments on the LiuZhou dataset validate the advantages of the DepthSeg framework in land cover mapping tasks. Detailed ablation studies further highlight the significance of the depth prompts in remote sensing semantic segmentation.

Related papers

Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image [51.689871870692194]
Metric-r is a novel sliding anchor-based metric depth estimation method.<n>Our design enables a unified and adaptive depth representation across diverse environments.
arXiv Detail & Related papers (2025-04-16T14:12:25Z)
Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge. Our method extracts low-level features from edges and textures to create a texture image. By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z)
Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios. We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture. Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z)
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation [34.786268652516355]
Scene segmentation via unsupervised domain adaptation (UDA) enables the transfer of knowledge acquired from source synthetic data to real-world target data. We propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i.e., segmentation and depth learning. In particular, the framework contains a Depth-guided Contextual Filter (DCF) forndata augmentation and a cross-task encoder for contextual learning.
arXiv Detail & Related papers (2023-11-21T15:39:21Z)
SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion [18.19171031755595]
We propose a new end-to-end model for performing semantic segmentation and depth completion jointly. Our approach relies on RGB and sparse depth as inputs to our model and produces a dense depth map and the corresponding semantic segmentation image. Experiments done on Virtual KITTI 2 dataset, demonstrate and provide further evidence, that combining both tasks, semantic segmentation and depth completion, in a multi-task network can effectively improve the performance of each task.
arXiv Detail & Related papers (2022-09-01T11:52:11Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation [84.34227665232281]
Domain adaptation for semantic segmentation aims to improve the model performance in the presence of a distribution shift between source and target domain. We leverage the guidance from self-supervised depth estimation, which is available on both domains, to bridge the domain gap. We demonstrate the effectiveness of our proposed approach on the benchmark tasks SYNTHIA-to-Cityscapes and GTA-to-Cityscapes.
arXiv Detail & Related papers (2021-04-28T07:47:36Z)
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z)
Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth Estimation with Both Implicit and Explicit Semantic Guidance [34.62415122883441]
We propose a Semantic-aware Spatial Feature Alignment scheme to align implicit semantic features with depth features for scene-aware depth estimation. We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties. Our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories.
arXiv Detail & Related papers (2021-02-11T14:29:51Z)
Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation [39.845944724079814]
Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input. However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability. We propose a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations.
arXiv Detail & Related papers (2020-12-15T02:24:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.