Related papers: Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

URL: http://arxiv.org/abs/2412.16901v1
Date: Sun, 22 Dec 2024 07:24:09 GMT
Title: Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers
Authors: Qi Deng, Shuaicheng Niu, Ronghao Zhang, Yaofo Chen, Runhao Zeng, Jian Chen, Xiping Hu,
Abstract summary: Test-time adaptation aims to fine-tune a trained model online using unlabeled testing data.<n>In this optimization process, unsupervised learning objectives like entropy frequently encounter noisy learning signals.<n>We employ a learning-to-optimize approach to automatically learn an entropy generator called Meta Gradient Generator.
Score: 18.921532965557475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time adaptation (TTA) aims to fine-tune a trained model online using unlabeled testing data to adapt to new environments or out-of-distribution data, demonstrating broad application potential in real-world scenarios. However, in this optimization process, unsupervised learning objectives like entropy minimization frequently encounter noisy learning signals. These signals produce unreliable gradients, which hinder the model ability to converge to an optimal solution quickly and introduce significant instability into the optimization process. In this paper, we seek to resolve these issues from the perspective of optimizer design. Unlike prior TTA using manually designed optimizers like SGD, we employ a learning-to-optimize approach to automatically learn an optimizer, called Meta Gradient Generator (MGG). Specifically, we aim for MGG to effectively utilize historical gradient information during the online optimization process to optimize the current model. To this end, in MGG, we design a lightweight and efficient sequence modeling layer -- gradient memory layer. It exploits a self-supervised reconstruction loss to compress historical gradient information into network parameters, thereby enabling better memorization ability over a long-term adaptation process. We only need a small number of unlabeled samples to pre-train MGG, and then the trained MGG can be deployed to process unseen samples. Promising results on ImageNet-C, R, Sketch, and A indicate that our method surpasses current state-of-the-art methods with fewer updates, less data, and significantly shorter adaptation iterations. Compared with a previous SOTA method SAR, we achieve 7.4% accuracy improvement and 4.2 times faster adaptation speed on ImageNet-C.

Related papers

Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Improving Instance Optimization in Deformable Image Registration with Gradient Projection [7.6061804149819885]
Deformable image registration is inherently a multi-objective optimization problem. These conflicting objectives often lead to poor optimization outcomes. Deep learning methods have recently gained popularity in this domain due to their efficiency in processing large datasets.
arXiv Detail & Related papers (2024-10-21T08:27:13Z)
Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery [35.138312681232264]
We propose a dual-network architecture that unifies the training-time and test-time objectives. Our method, armed with meta-learning and the dual networks, outperforms state-of-the-art regression-based and optimization-based HMR approaches.
arXiv Detail & Related papers (2024-01-25T12:04:53Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Faster Optimization-Based Meta-Learning Adaptation Phase [0.0]
We introduce Lambda patterns by which we restrict which weight are updated in the network during the adaptation phase. The experiments conducted have shown that it is possible to significantly improve the MAML method in the following areas.
arXiv Detail & Related papers (2022-06-13T06:57:17Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling. It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.