RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo
- URL: http://arxiv.org/abs/2511.10107v1
- Date: Fri, 14 Nov 2025 01:33:08 GMT
- Title: RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo
- Authors: Jueun Ko, Hyewon Park, Hyesong Choi, Dongbo Min,
- Abstract summary: RobIA is a novel Robust, Instance-Aware framework for Continual Test-Time Adaptation in stereo depth estimation.<n>RobIA integrates two key components: (1) Attend-and-Excite Mixture-of-Experts (AttEx-MoE), a parameter-efficient module that dynamically routes input to frozen experts via lightweight self-attention mechanism tailored to epipolar geometry, and (2) Robust AdaptBN Teacher, a PEFT-based teacher model that provides dense pseudo-supervision by complementing sparse handcrafted labels.
- Score: 18.836469118006594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stereo Depth Estimation in real-world environments poses significant challenges due to dynamic domain shifts, sparse or unreliable supervision, and the high cost of acquiring dense ground-truth labels. While recent Test-Time Adaptation (TTA) methods offer promising solutions, most rely on static target domain assumptions and input-invariant adaptation strategies, limiting their effectiveness under continual shifts. In this paper, we propose RobIA, a novel Robust, Instance-Aware framework for Continual Test-Time Adaptation (CTTA) in stereo depth estimation. RobIA integrates two key components: (1) Attend-and-Excite Mixture-of-Experts (AttEx-MoE), a parameter-efficient module that dynamically routes input to frozen experts via lightweight self-attention mechanism tailored to epipolar geometry, and (2) Robust AdaptBN Teacher, a PEFT-based teacher model that provides dense pseudo-supervision by complementing sparse handcrafted labels. This strategy enables input-specific flexibility, broad supervision coverage, improving generalization under domain shift. Extensive experiments demonstrate that RobIA achieves superior adaptation performance across dynamic target domains while maintaining computational efficiency.
Related papers
- OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection [62.89126207012712]
Face Presentation Attack Detection (PAD) demands incremental learning to combat spoofing tactics and domains.<n>Privacy regulations forbid retaining past data, necessitating rehearsal-free learning (RF-IL)
arXiv Detail & Related papers (2025-12-22T04:30:11Z) - Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training [30.589225478300023]
DTop-p is a sparsity-controllable dynamic Top-p routing mechanism.<n>We show that DTop-p consistently outperforms both Top-k and fixed-threshold Top-p baselines.<n>DTop-p exhibits strong scaling properties with respect to expert granularity, expert capacity, model size, and dataset size.
arXiv Detail & Related papers (2025-12-16T01:28:57Z) - Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z) - ALFred: An Active Learning Framework for Real-world Semi-supervised Anomaly Detection with Adaptive Thresholds [2.1374208474242815]
Video Anomaly Detection (VAD) can play a key role in spotting unusual activities in video footage.<n>VAD is difficult to use in real-world settings due to the dynamic nature of human actions, environmental variations, and domain shifts.<n>We introduce an active learning framework tailored for VAD, designed for adapting to the ever-changing real-world conditions.
arXiv Detail & Related papers (2025-08-12T16:18:54Z) - Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts [29.52183168979229]
We propose SMoEStereo, a novel framework that adapts VFMs for stereo matching through a tailored, scene-specific fusion of Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) modules.<n>Our method exhibits state-of-the-art cross-domain and joint generalization across multiple benchmarks without dataset-specific adaptation.
arXiv Detail & Related papers (2025-07-07T03:19:04Z) - Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization [9.03028904066824]
Open-set test-time adaptation (OSTTA) aims to adapt a source pre-trained model online to an unlabeled target domain that contains unknown classes.<n>We present Adaptive Entropy-aware Optimization (AEO), a novel framework specifically designed to tackle Multimodal Open-set Test-time Adaptation.
arXiv Detail & Related papers (2025-01-23T18:59:30Z) - Enhancing Test Time Adaptation with Few-shot Guidance [62.49199492255226]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.<n>Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.<n>We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [20.307151769610087]
Continual Test-Time Adaptation (CTTA) has emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.<n>We present AMROD, featuring three core components, to tackle these challenges for detection models in CTTA scenarios.<n>We demonstrate the effectiveness of AMROD on four CTTA object detection tasks, where AMROD outperforms existing methods.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation [54.734201944510026]
We propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix.<n>In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient.<n>This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner.
arXiv Detail & Related papers (2022-07-27T07:48:29Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.