Related papers: The Entropy Enigma: Success and Failure of Entropy Minimization

The Entropy Enigma: Success and Failure of Entropy Minimization

URL: http://arxiv.org/abs/2405.05012v2
Date: Sun, 12 May 2024 22:21:27 GMT
Title: The Entropy Enigma: Success and Failure of Entropy Minimization
Authors: Ori Press, Ravid Shwartz-Ziv, Yann LeCun, Matthias Bethge,
Abstract summary: Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. We analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for many steps. We present a method for solving a practical problem: estimating a model's accuracy on a given arbitrary dataset without having access to its labels.
Score: 30.083332640328642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to their top predicted classes. In this paper, we analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for many steps. We show that, at first, EM causes the model to embed test images close to training images, thereby increasing model accuracy. After many steps of optimization, EM makes the model embed test images far away from the embeddings of training images, which results in a degradation of accuracy. Building upon our insights, we present a method for solving a practical problem: estimating a model's accuracy on a given arbitrary dataset without having access to its labels. Our method estimates accuracy by looking at how the embeddings of input images change as the model is optimized to minimize entropy. Experiments on 23 challenging datasets show that our method sets the SoTA with a mean absolute error of $5.75\%$, an improvement of $29.62\%$ over the previous SoTA on this task. Our code is available at https://github.com/oripress/EntropyEnigma

Related papers

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities. We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses. We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z)
Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers [18.921532965557475]
Test-time adaptation aims to fine-tune a trained model online using unlabeled testing data. In this optimization process, unsupervised learning objectives like entropy frequently encounter noisy learning signals. We employ a learning-to-optimize approach to automatically learn an entropy generator called Meta Gradient Generator.
arXiv Detail & Related papers (2024-12-22T07:24:09Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
More precise edge detections [0.0]
Edge detection (ED) is a base task in computer vision. Current models still suffer from unsatisfactory precision rates. Model architecture for more precise predictions still needs an investigation.
arXiv Detail & Related papers (2024-07-29T13:24:55Z)
On-the-Fly Test-time Adaptation for Medical Image Segmentation [63.476899335138164]
Adapting the source model to target data distribution at test-time is an efficient solution for the data-shift problem. We propose a new framework called Adaptive UNet where each convolutional block is equipped with an adaptive batch normalization layer. During test-time, the model takes in just the new test image and generates a domain code to adapt the features of source model according to the test data.
arXiv Detail & Related papers (2022-03-10T18:51:29Z)
MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time. We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z)
Few-Shot Lifelong Learning [35.05196800623617]
Few-Shot Lifelong Learning enables deep learning models to perform lifelong/continual learning on few-shot data. Our method selects very few parameters from the model for training every new set of classes instead of training the full model. We experimentally show that our method significantly outperforms existing methods on the miniImageNet, CIFAR-100, and CUB-200 datasets.
arXiv Detail & Related papers (2021-03-01T13:26:57Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Tent: Fully Test-time Adaptation by Entropy Minimization [77.85911673550851]
A model must adapt itself to generalize to new and different data during testing. In this setting of fully test-time adaptation the model has only the test data and its own parameters. We propose to adapt by test entropy minimization (tent): we optimize the model for confidence as measured by the entropy of its predictions.
arXiv Detail & Related papers (2020-06-18T17:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.