Related papers: Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time Learning

Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time Learning

URL: http://arxiv.org/abs/2512.08048v1
Date: Mon, 08 Dec 2025 21:16:44 GMT
Title: Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time Learning
Authors: Chandler Timm C. Doloriel,
Abstract summary: Recent continual test-time adaptation (CTTA) methods use masking to regulate learning, but often depend on calibrated uncertainty or stable attention scores.<n>We introduce Mask to Adapt (M2A), a simple CTTA approach that generates a short sequence of masked views.<n>We show that M2A attains 8.3%/19.8%/39.2% mean error, outperforming or matching strong CTTA baselines.
Score: 1.1458853556386797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distribution shifts at test time degrade image classifiers. Recent continual test-time adaptation (CTTA) methods use masking to regulate learning, but often depend on calibrated uncertainty or stable attention scores and introduce added complexity. We ask: do we need custom-made masking designs, or can a simple random masking schedule suffice under strong corruption? We introduce Mask to Adapt (M2A), a simple CTTA approach that generates a short sequence of masked views (spatial or frequency) and adapts with two objectives: a mask consistency loss that aligns predictions across different views and an entropy minimization loss that encourages confident outputs. Motivated by masked image modeling, we study two common masking families -- spatial masking and frequency masking -- and further compare subtypes within each (spatial: patch vs.\ pixel; frequency: all vs.\ low vs.\ high). On CIFAR10C/CIFAR100C/ImageNetC (severity~5), M2A (Spatial) attains 8.3\%/19.8\%/39.2\% mean error, outperforming or matching strong CTTA baselines, while M2A (Frequency) lags behind. Ablations further show that simple random masking is effective and robust. These results indicate that a simple random masking schedule, coupled with consistency and entropy objectives, is sufficient to drive effective test-time adaptation without relying on uncertainty or attention signals.

Related papers

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
Halton Scheduler For Masked Generative Image Transformer [51.82285573627426]
Masked Generative Image Transformers (MaskGIT) have emerged as a scalable and efficient image generation framework.<n>We analyze the sampling objective in MaskGIT, based on the mutual information between tokens.<n>We propose a new sampling strategy based on our Halton scheduler instead of the original Confidence scheduler.
arXiv Detail & Related papers (2025-03-21T12:00:59Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [55.12082817901671]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)<n>MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.<n>Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting.
arXiv Detail & Related papers (2023-06-12T18:12:19Z)
Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example. In contrast, the attacker only needs to find one successful perturbation. We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z)
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy [28.62712217754428]
We present dynamic alignment Mask CTC. We introduce two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC.
arXiv Detail & Related papers (2023-03-14T08:01:21Z)
MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers. We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency. EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z)
Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z)
Application of Yolo on Mask Detection Task [1.941730292017383]
Strict mask-wearing policies have been met with not only public sensation but also practical difficulty. Existing technology to help automate mask checking uses deep learning models on real-time surveillance camera footages. Our research proposes a new approach to mask detection by replacing Mask-R-CNN with a more efficient model "YOLO"
arXiv Detail & Related papers (2021-02-10T12:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.