Related papers: FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

URL: http://arxiv.org/abs/2603.04733v1
Date: Thu, 05 Mar 2026 02:12:48 GMT
Title: FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation
Authors: Xingyu Wang, Tao Wang,
Abstract summary: Test-Time Adaptation is essential for enabling deep learning models to handle real-world data distribution shifts.<n>Backpropagation-based methods are not suitable for low-end deployment devices.<n>We propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA.
Score: 9.28697795097814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-Time Adaptation (TTA) is essential for enabling deep learning models to handle real-world data distribution shifts. However, current approaches face significant limitations: backpropagation-based methods are not suitable for low-end deployment devices, due to their high computation and memory requirements, as well as their tendency to modify model weights during adaptation; while traditional backpropagation-free techniques exhibit constrained adaptation capabilities. In this work, we propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA. FOZO leverages a memory-efficient zeroth-order prompt optimization, which is led by objectives optimizing both intermediate feature statistics and prediction entropy. To ensure efficient and stable adaptation over the out-of-distribution data stream, we introduce a dynamically decaying perturbation scale during zeroth-order gradient estimation and theoretically prove its convergence under the TTA data stream assumption. Extensive continual adaptation experiments on ImageNet-C, ImageNet-R, and ImageNet-Sketch demonstrate FOZO's superior performance, achieving 59.52% Top-1 accuracy on ImageNet-C (5K, level 5) and outperforming main gradient-based methods and SOTA forward-only FOA (58.13%). Furthermore, FOZO exhibits strong generalization on quantized (INT8) models. These findings demonstrate that FOZO is a highly competitive solution for TTA deployment in resource-limited scenarios.

Related papers

Energy-based Preference Optimization for Test-time Adaptation [4.379304291229695]
Test-Time Adaptation (TTA) approaches focus on adjusting the conditional distribution.<n>These methods often depend on uncertain predictions in the absence of label information, leading to unreliable performance.<n>Energy-based frameworks suggest a promising alternative to address distribution shifts without relying on uncertain predictions, instead computing the marginal distribution of target data.
arXiv Detail & Related papers (2025-05-26T07:21:32Z)
DFF: Decision-Focused Fine-tuning for Smarter Predict-then-Optimize with Limited Data [7.70699448711673]
Decision-focused learning (DFL) offers an end-to-end approach to the predict-then-optimize (PO) framework by training predictive models directly on decision loss (DL)<n>Some predictive models are non-differentiable or black-box, which cannot be adjusted using gradient-based methods.<n>We propose a novel framework, Decision-Focused Fine-tuning (DFF), which embeds the DFL module into the PO pipeline via a novel bias correction module.
arXiv Detail & Related papers (2025-01-03T15:46:25Z)
COME: Test-time adaption by Conservatively Minimizing Entropy [45.689829178140634]
Conservatively Minimize the Entropy (COME) is a drop-in replacement of traditional entropy (EM) COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions. We show that COME achieves state-of-the-art performance on commonly used benchmarks.
arXiv Detail & Related papers (2024-10-12T09:20:06Z)
Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks. Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. We propose a test-time Forward-Optimization Adaptation (FOA) method. FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation [5.749155230209001]
Fully-test-time adaptation (F-TTA) can mitigate performance loss due to distribution shifts between train and test data. We present a general framework for improving robustness of F-TTA to noisy samples, inspired by self-paced learning and robust loss functions.
arXiv Detail & Related papers (2023-09-07T18:44:58Z)
Tent: Fully Test-time Adaptation by Entropy Minimization [77.85911673550851]
A model must adapt itself to generalize to new and different data during testing. In this setting of fully test-time adaptation the model has only the test data and its own parameters. We propose to adapt by test entropy minimization (tent): we optimize the model for confidence as measured by the entropy of its predictions.
arXiv Detail & Related papers (2020-06-18T17:55:28Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.