Related papers: Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation

Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation

URL: http://arxiv.org/abs/2508.21270v1
Date: Fri, 29 Aug 2025 00:13:02 GMT
Title: Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation
Authors: Roland Arnold,
Abstract summary: Evaluation of machine learning models typically emphasizes final accuracy, overlooking the cost of adaptation cumulative errors incurred while learning from scratch.<n>Guess-and- Learn (G&L) v1.0 addresses this gap by measuring cold-start adaptability - the total mistakes a model makes while sequentially labeling an unlabeled dataset.
Score: 0.11102988539107704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evaluation of machine learning models typically emphasizes final accuracy, overlooking the cost of adaptation: the cumulative errors incurred while learning from scratch. Guess-and- Learn (G&L) v1.0 addresses this gap by measuring cold-start adaptability - the total mistakes a model makes while sequentially labeling an unlabeled dataset. At each step, the learner selects an instance, predicts its label, receives the ground truth, and updates parameters under either online (per-sample) or batch (delayed) mode. The resulting error trajectory exposes adaptation speed, selection quality, and bias - dynamics invisible to endpoint metrics. G&L defines four tracks (Scratch/Pretrained $\times$ Online/Batch) to disentangle the effects of initialization and update frequency. We formalize the protocol, relate it to classical mistake-bound theory, and estimate a heuristic "oracle reference band" for MNIST as a plausibility reference. Baseline experiments on MNIST and AG News, spanning classical methods (Perceptron, k-NN), convolutional architectures (CNN, ResNet-50), and pretrained transformers (ViT-B/16, BERT-base), reveal systematic differences in early-phase efficiency: smaller models can adapt with fewer initial errors, while pretraining benefits vary by domain. Across settings, current models remain well above the oracle band, highlighting an adaptability gap. By quantifying the mistake cost of early learning, G&L complements conventional benchmarks and provides a reproducible framework for developing learners that are not only accurate in the limit but also reliable from the first examples.

Related papers

Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks [0.7106986689736825]
We present textitOnline Bayesian Imbalanced Learning (OBIL), a principled framework that decouples likelihood-ratio estimation from class-prior assumptions.<n>Our approach builds on the established connection between Bregman divergences and proper scoring rules to show that deep networks trained with such losses produce posterior probability estimates.<n>We prove that these likelihood-ratio estimates remain valid under arbitrary changes in class priors and cost structures, requiring only a threshold adjustment for optimal Bayes decisions.
arXiv Detail & Related papers (2026-02-08T21:23:00Z)
Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors [8.410613979416203]
We use class-conditional normalizing flows as oracles that make exact posteriors tractable on realistic images.<n>Our framework reveals that standard metrics hide ongoing learning, mask architectural differences, and cannot diagnose the nature of distribution shift.
arXiv Detail & Related papers (2026-01-30T21:08:55Z)
STABLE: Gated Continual Learning for Large Language Models [0.0]
STABLE is a gated continual self editing framework that constrains forgetting during sequential updates.<n>Each candidate edit is evaluated against a stability budget using one of three metrics.<n>Experiments on the Qwen-2.5-7B model show that gating effectively mitigates forgetting while preserving adaptability.
arXiv Detail & Related papers (2025-10-17T16:14:05Z)
ASAP: Unsupervised Post-training with Label Distribution Shift Adaptive Learning Rate [3.187381965457262]
ASAP adjusts the learning rate by computing the cosine distance between current and previous unlabeled outputs and mapping it within a bounded range.<n>Experiments show ASAP consistently improves accuracy and efficiency, making it practical for unsupervised model adaptation.
arXiv Detail & Related papers (2025-08-19T01:59:24Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection [27.15951068292889]
This paper proposes one-class classification for anomaly detection. It realizes contamination-tolerant, anomaly-informed learning of data normality. Our model achieves substantial improvement over sixteen state-of-the-art contenders.
arXiv Detail & Related papers (2022-07-25T13:43:13Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
Learning to Learn to Demodulate with Uncertainty Quantification via Bayesian Meta-Learning [59.014197664747165]
We introduce the use of Bayesian meta-learning via variational inference for the purpose of obtaining well-calibrated few-pilot demodulators. The resulting Bayesian ensembles offer better calibrated soft decisions, at the computational cost of running multiple instances of the neural network for demodulation.
arXiv Detail & Related papers (2021-08-02T11:07:46Z)
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.