AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction
- URL: http://arxiv.org/abs/2412.18255v1
- Date: Tue, 24 Dec 2024 08:12:31 GMT
- Title: AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction
- Authors: Pufan Zou, Shijia Zhao, Weijie Huang, Qiming Xia, Chenglu Wen, Wei Li, Cheng Wang,
- Abstract summary: We propose a novel label-free learning method, Adaptive Label Correction (AdaCo), for 3D semantic segmentation.<n>AdaCo incorporates the Cross-modal Label Generation Module (CLGM), updating and adjusting the noisy samples within this supervision iteratively during training.<n>Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks.
- Score: 14.51758173099208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning method, Adaptive Label Correction (AdaCo), for 3D semantic segmentation. AdaCo first introduces the Cross-modal Label Generation Module (CLGM), providing cross-modal supervision with the formidable interpretive capabilities of the VFMs. Subsequently, AdaCo incorporates the Adaptive Noise Corrector (ANC), updating and adjusting the noisy samples within this supervision iteratively during training. Moreover, we develop an Adaptive Robust Loss (ARL) function to modulate each sample's sensitivity to noisy supervision, preventing potential underfitting issues associated with robust loss. Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks. Extensive experiments on two outdoor benchmark datasets highlight the superior performance of our method.
Related papers
- Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection [52.5174167737992]
Video anomaly detection (VAD) aims to identify abnormal events in videos.<n>We propose SteerVAD, which advances MLLM-based VAD by shifting from passively reading to actively steering and rectifying internal representations.<n>Our method achieves state-of-the-art performance among tuning-free approaches requiring only 1% of training data.
arXiv Detail & Related papers (2026-02-27T13:48:50Z) - Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations [4.581671524490035]
We propose an end-to-end Staged Voxel-Level Deep Reinforcement Learning framework for robust medical image segmentation under noisy annotations.<n>This framework employs a dynamic iterative update strategy to automatically mitigate the impact of erroneous labels without requiring manual intervention.
arXiv Detail & Related papers (2026-01-07T12:39:54Z) - Noise-Robust Tiny Object Localization with Flows [63.60972031108944]
We propose a noise-robust localization framework leveraging normalizing flows for flexible error modeling and uncertainty-guided optimization.<n>Our method captures complex, non-Gaussian prediction distributions through flow-based error modeling, enabling robust learning under noisy supervision.<n>An uncertainty-aware gradient modulation mechanism further suppresses learning from high-uncertainty, noise-prone samples, mitigating overfitting while stabilizing training.
arXiv Detail & Related papers (2026-01-02T09:16:55Z) - DANS-KGC: Diffusion Based Adaptive Negative Sampling for Knowledge Graph Completion [10.190273470704112]
We propose DANS-KGC (Diffusion-based Adaptive Negative Sampling for Knowledge Graph Completion) to overcome the limitations of existing negative sampling strategies.<n> DANS-KGC comprises three key components: the Difficulty Assessment Module (DAM), the Adaptive Negative Sampling Module (ANS), and the Dynamic Training Mechanism (DTM)<n>DTM enhances learning by dynamically adjusting the hardness distribution of negative samples throughout training.
arXiv Detail & Related papers (2025-11-11T06:56:57Z) - TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments [3.3107717550009865]
TOAST (Task-Oriented Adaptive Semantic Transmission) is a unified framework designed to address the core challenge of multi-task optimization in wireless environments.<n>We formulate adaptive task balancing as a Markov decision process, employing deep reinforcement learning to dynamically adjust the trade-off between image reconstruction fidelity and semantic classification accuracy.<n>We integrate module-specific Low-Rank Adaptation (LoRA) mechanisms throughout our Swin Transformer-based joint source-channel coding architecture.
arXiv Detail & Related papers (2025-06-27T04:36:30Z) - Robust Duality Learning for Unsupervised Visible-Infrared Person Re-Identification [24.24793934981947]
We introduce a new learning paradigm that considers Pseudo-Label Noise (PLN)<n>PLN is characterized by three key challenges: noise overfitting, error accumulation, and noisy cluster correspondence.<n>We propose a novel Robust Duality Learning framework (RoDE) for UVI-ReID to mitigate the effects of noisy pseudo-labels.
arXiv Detail & Related papers (2025-05-05T10:36:52Z) - A Language Anchor-Guided Method for Robust Noisy Domain Generalization [20.83580289888522]
We introduce Anchor Alignment and Adaptive Weighting (A3W)
A3W uses sample reweighting guided by natural language processing (NLP) anchors to extract more representative features.
It consistently outperforms state-of-the-art domain generalization methods.
arXiv Detail & Related papers (2025-03-21T15:20:28Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - Robust Tiny Object Detection in Aerial Images amidst Label Noise [50.257696872021164]
This study addresses the issue of tiny object detection under noisy label supervision.
We propose a DeNoising Tiny Object Detector (DN-TOD), which incorporates a Class-aware Label Correction scheme.
Our method can be seamlessly integrated into both one-stage and two-stage object detection pipelines.
arXiv Detail & Related papers (2024-01-16T02:14:33Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - Learning to Adapt to Unseen Abnormal Activities under Weak Supervision [43.40900198498228]
We present a meta-learning framework for weakly supervised anomaly detection in videos.
Our framework learns to adapt to unseen types of abnormal activities effectively when only video-level annotations of binary labels are available.
arXiv Detail & Related papers (2022-03-25T12:15:44Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z) - Guided Point Contrastive Learning for Semi-supervised Point Cloud
Semantic Segmentation [90.2445084743881]
We present a method for semi-supervised point cloud semantic segmentation to adopt unlabeled point clouds in training to boost the model performance.
Inspired by the recent contrastive loss in self-supervised tasks, we propose the guided point contrastive loss to enhance the feature representation and model generalization ability.
arXiv Detail & Related papers (2021-10-15T16:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.