Teaching in adverse scenes: a statistically feedback-driven threshold and mask adjustment teacher-student framework for object detection in UAV images under adverse scenes
- URL: http://arxiv.org/abs/2506.11175v1
- Date: Thu, 12 Jun 2025 09:21:58 GMT
- Title: Teaching in adverse scenes: a statistically feedback-driven threshold and mask adjustment teacher-student framework for object detection in UAV images under adverse scenes
- Authors: Hongyu Chen, Jiping Liu, Yong Wang, Jun Zhu, Dejun Feng, Yakun Xie,
- Abstract summary: Unsupervised Domain Adaptation (UDA) has shown promise in effectively alleviating the performance degradation caused by domain gaps between source and target domains.<n>However, existing UDA studies are based on natural images or clear UAV imagery.<n>We propose the first benchmark for UAV object detection in adverse scenes, the Statistical Feedback-Driven Threshold and Mask Adjustment Teacher-Student Framework.
- Score: 22.941247028439477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Domain Adaptation (UDA) has shown promise in effectively alleviating the performance degradation caused by domain gaps between source and target domains, and it can potentially be generalized to UAV object detection in adverse scenes. However, existing UDA studies are based on natural images or clear UAV imagery, and research focused on UAV imagery in adverse conditions is still in its infancy. Moreover, due to the unique perspective of UAVs and the interference from adverse conditions, these methods often fail to accurately align features and are influenced by limited or noisy pseudo-labels. To address this, we propose the first benchmark for UAV object detection in adverse scenes, the Statistical Feedback-Driven Threshold and Mask Adjustment Teacher-Student Framework (SF-TMAT). Specifically, SF-TMAT introduces a design called Dynamic Step Feedback Mask Adjustment Autoencoder (DSFMA), which dynamically adjusts the mask ratio and reconstructs feature maps by integrating training progress and loss feedback. This approach dynamically adjusts the learning focus at different training stages to meet the model's needs for learning features at varying levels of granularity. Additionally, we propose a unique Variance Feedback Smoothing Threshold (VFST) strategy, which statistically computes the mean confidence of each class and dynamically adjusts the selection threshold by incorporating a variance penalty term. This strategy improves the quality of pseudo-labels and uncovers potentially valid labels, thus mitigating domain bias. Extensive experiments demonstrate the superiority and generalization capability of the proposed SF-TMAT in UAV object detection under adverse scene conditions. The Code is released at https://github.com/ChenHuyoo .
Related papers
- CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector [10.419327930845922]
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts.<n>We present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO)
arXiv Detail & Related papers (2024-12-16T14:25:52Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Mean Teacher DETR with Masked Feature Alignment: A Robust Domain
Adaptive Detection Transformer Framework [41.998727427261734]
Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage.
In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation.
In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher.
arXiv Detail & Related papers (2023-10-24T09:07:47Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose
Estimation [84.16372642822495]
We propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called textbfUDA-COPE.
Inspired by the recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain labels.
arXiv Detail & Related papers (2021-11-24T16:00:48Z) - Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine
MRI Synthesis [10.636015177721635]
We propose a novel generative self-training framework with continuous value prediction and regression objective for cross-domain image synthesis.
Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning.
arXiv Detail & Related papers (2021-06-23T16:19:00Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - Regularization with Latent Space Virtual Adversarial Training [4.874780144224057]
Virtual Adversarial Training (VAT) has shown impressive results among recently developed regularization methods.
We propose LVAT, which injects perturbation in the latent space instead of the input space.
LVAT can generate adversarial samples flexibly, resulting in more adverse effects and thus more effective regularization.
arXiv Detail & Related papers (2020-11-26T08:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.