Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach
- URL: http://arxiv.org/abs/2511.16343v1
- Date: Thu, 20 Nov 2025 13:26:38 GMT
- Title: Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach
- Authors: Chi-Han Chen, Chieh-Ming Chen, Wen-Huang Cheng, Ching-Chun Huang,
- Abstract summary: The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks.<n>This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance.<n>It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures.
- Score: 20.833194584822532
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks. Besides grappling with the complexity of data annotation and ensuring temporal consistency, it also confronts the scarcity of relevant data and the limitations imposed by the effective range of many technologies. This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance. It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures. Hence, a teacher-student architecture, coupled with key frame selection and key frame updating algorithms, is proposed. This framework successfully performs weakly supervised learning and TC knowledge distillation, overcoming the deficiencies of traditional TC training in aerial tasks. The experimental results reveal that our method utilizing merely 30\% of labeled data, concurrently elevates mIoU and temporal consistency ensuring stable localization of terrain objects. Result demo : https://gitlab.com/prophet.ai.inc/drone-based-riverbed-inspection
Related papers
- iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining [1.3573542141741506]
implicit Clustering Distillation (iCD) is a simple and effective method that mines and transfers interpretable structural knowledge from logits.<n>Experiments on benchmark datasets demonstrate the effectiveness of iCD across diverse teacher-student architectures.
arXiv Detail & Related papers (2025-09-16T01:16:13Z) - PAUL: Uncertainty-Guided Partition and Augmentation for Robust Cross-View Geo-Localization under Noisy Correspondence [16.322806051944227]
Cross-view geo-localization is a critical task for UAV navigation, event detection, and aerial surveying.<n>Most existing approaches embed multi-modal data into a joint feature space to maximize the similarity of paired images.<n>These methods typically assume perfect alignment of image pairs during training, which rarely holds true in real-world scenarios.
arXiv Detail & Related papers (2025-08-27T17:21:22Z) - Pseudo-Label Guided Real-World Image De-weathering: A Learning Framework with Imperfect Supervision [57.5699142476311]
We propose a unified solution for real-world image de-weathering with non-ideal supervision.<n>Our method exhibits significant advantages when trained on imperfectly aligned de-weathering datasets.
arXiv Detail & Related papers (2025-04-14T07:24:03Z) - Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z) - Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization [20.603433987118837]
Drone-view Geo-Localization (DVGL) aims to achieve accurate localization of drones by retrieving the most relevant GPS-tagged satellite images.<n>Existing methods heavily rely on strictly pre-paired drone-satellite images for supervised learning.<n>We propose a novel end-to-end self-supervised learning method with a shallow backbone network.
arXiv Detail & Related papers (2025-02-17T02:53:08Z) - Weakly Supervised Framework Considering Multi-temporal Information for Large-scale Cropland Mapping with Satellite Imagery [11.157693752084214]
This study presents a weakly supervised framework considering multi-temporal information for large-scale cropland mapping.<n>We extract high-quality labels according to their consistency among global land cover (GLC) products to construct the supervised learning signal.<n>The proposed framework has been experimentally validated for strong adaptability across three study areas in large-scale cropland mapping.
arXiv Detail & Related papers (2024-11-27T16:11:52Z) - Federated Adversarial Learning for Robust Autonomous Landing Runway Detection [6.029462194041386]
In this paper, we propose a federated adversarial learning-based framework to detect landing runways.
To the best of our knowledge, this marks the first instance of federated learning work that address the adversarial sample problem in landing runway detection.
arXiv Detail & Related papers (2024-06-22T19:31:52Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Imbalanced Aircraft Data Anomaly Detection [103.01418862972564]
Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task.
We propose a Graphical Temporal Data Analysis framework.
It consists three modules, named Series-to-Image (S2I), Cluster-based Resampling Approach using Euclidean Distance (CRD) and Variance-Based Loss (VBL)
arXiv Detail & Related papers (2023-05-17T09:37:07Z) - Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer [53.413305467674434]
We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information.
We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation.
Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
arXiv Detail & Related papers (2022-08-26T09:35:20Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote
Sensing Data [64.40187171234838]
Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of re-mote sensing representations.
SeCo will be made public to facilitate transfer learning and enable rapid progress in re-mote sensing applications.
arXiv Detail & Related papers (2021-03-30T18:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.