Related papers: In Search of Lost Online Test-time Adaptation: A Survey

In Search of Lost Online Test-time Adaptation: A Survey

URL: http://arxiv.org/abs/2310.20199v3
Date: Thu, 18 Jul 2024 07:58:02 GMT
Title: In Search of Lost Online Test-time Adaptation: A Survey
Authors: Zixin Wang, Yadan Luo, Liang Zheng, Zhuoxiao Chen, Sen Wang, Zi Huang,
Abstract summary: This article presents a comprehensive survey of online test-time adaptation (OTTA) We classify OTTA techniques into three primary categories and benchmark them using a modern backbone, the Vision Transformer (ViT) Our findings diverge from existing literature, revealing that transformers demonstrate heightened resilience to diverse domain shifts.
Score: 40.68806005826287
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This article presents a comprehensive survey of online test-time adaptation (OTTA), focusing on effectively adapting machine learning models to distributionally different target data upon batch arrival. Despite the recent proliferation of OTTA methods, conclusions from previous studies are inconsistent due to ambiguous settings, outdated backbones, and inconsistent hyperparameter tuning, which obscure core challenges and hinder reproducibility. To enhance clarity and enable rigorous comparison, we classify OTTA techniques into three primary categories and benchmark them using a modern backbone, the Vision Transformer (ViT). Our benchmarks cover conventional corrupted datasets such as CIFAR-10/100-C and ImageNet-C, as well as real-world shifts represented by CIFAR-10.1, OfficeHome, and CIFAR-10-Warehouse. The CIFAR-10-Warehouse dataset includes a variety of variations from different search engines and synthesized data generated through diffusion models. To measure efficiency in online scenarios, we introduce novel evaluation metrics, including GFLOPs, wall clock time, and GPU memory usage, providing a clearer picture of the trade-offs between adaptation accuracy and computational overhead. Our findings diverge from existing literature, revealing that (1) transformers demonstrate heightened resilience to diverse domain shifts, (2) the efficacy of many OTTA methods relies on large batch sizes, and (3) stability in optimization and resistance to perturbations are crucial during adaptation, particularly when the batch size is 1. Based on these insights, we highlight promising directions for future research. Our benchmarking toolkit and source code are available at https://github.com/Jo-wang/OTTA_ViT_survey.

Related papers

Online Gaussian Test-Time Adaptation of Vision-Language Models [13.90714913643503]
Online Gaussian Adaptation (OGA) is a novel method that models the likelihoods of visual features using Gaussian distributions. We demonstrate that OGA outperforms state-of-the-art methods on most datasets and runs. Our experimental study reveals that common OTTA evaluation protocols, which average performance over at most three runs per dataset, are inadequate due to the substantial variability observed across runs for all OTTA methods.
arXiv Detail & Related papers (2025-01-08T08:49:52Z)
Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs [6.456189487006878]
We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs) We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization.
arXiv Detail & Related papers (2024-07-07T05:39:25Z)
REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical. It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance. We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z)
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public. ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z)
Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files. We show that our URL transformer model requires a different training approach to reach high performance levels. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z)
Semantic Perturbations with Normalizing Flows for Improved Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
On the Generalization Effects of Linear Transformations in Data Augmentation [32.01435459892255]
Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. We study a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. We propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
arXiv Detail & Related papers (2020-05-02T04:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.