Related papers: LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought

LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought

URL: http://arxiv.org/abs/2505.15657v1
Date: Wed, 21 May 2025 15:32:42 GMT
Title: LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought
Authors: Cheng Yan, Felix Mohr, Tom Viering,
Abstract summary: We show that learning curves are less often well-behaved than previously thought.<n>Using statistically rigorous methods, we observe significant ill-behavior in approximately 14% of the learning curves.<n>We identify which learners are to blame and show that specific learners are more ill-behaved than others.
Score: 11.282804463462165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sample-wise learning curves plot performance versus training set size. They are useful for studying scaling laws and speeding up hyperparameter tuning and model selection. Learning curves are often assumed to be well-behaved: monotone (i.e. improving with more data) and convex. By constructing the Learning Curves Database 1.1 (LCDB 1.1), a large-scale database with high-resolution learning curves, we show that learning curves are less often well-behaved than previously thought. Using statistically rigorous methods, we observe significant ill-behavior in approximately 14% of the learning curves, almost twice as much as in previous estimates. We also identify which learners are to blame and show that specific learners are more ill-behaved than others. Additionally, we demonstrate that different feature scalings rarely resolve ill-behavior. We evaluate the impact of ill-behavior on downstream tasks, such as learning curve fitting and model selection, and find it poses significant challenges, underscoring the relevance and potential of LCDB 1.1 as a challenging benchmark for future research.

Related papers

FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning [24.734154431191538]
FaLW is a plug-and-play, instance-wise dynamic loss reweighting method.<n>It assesses the unlearning state of each sample by comparing its predictive probability to the distribution of unseen data from the same class.<n>Experiments demonstrate that FaLW achieves superior performance.
arXiv Detail & Related papers (2026-01-26T16:21:01Z)
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection [38.35524024887503]
We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS)<n>PROGRESS is a data- and compute-efficient framework that enables vision-language models to dynamically select what to learn next.<n>We show that PROGRESS consistently outperforms state-of-the-art baselines with much less data and supervision.
arXiv Detail & Related papers (2025-06-01T17:05:35Z)
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Learning from Neighbors: Category Extrapolation for Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.<n>We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.<n>To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z)
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning [9.998859702421417]
Machine unlearning (MU) aims to eliminate the influence of chosen data points on model performance. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting. We propose identifying the data subset that presents the most significant challenge for influence erasure, pinpointing the worst-case forget set.
arXiv Detail & Related papers (2024-03-12T06:50:32Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [93.90047628101155]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.<n>To address this, some methods propose replaying data from previous tasks during new task learning.<n>However, it is not expected in practice due to memory constraints and data privacy issues.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Towards Causal Deep Learning for Vulnerability Detection [31.59558109518435]
We introduce do calculus based causal learning to software engineering models. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance.
arXiv Detail & Related papers (2023-10-12T00:51:06Z)
Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z)
Estimation of Predictive Performance in High-Dimensional Data Settings using Learning Curves [0.0]
Learn2Evaluate is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.
arXiv Detail & Related papers (2022-06-08T11:48:01Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering [71.15403434929915]
We show that across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. We identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn. We show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases.
arXiv Detail & Related papers (2021-07-06T00:52:11Z)
The Shape of Learning Curves: a Review [14.764764847928259]
This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
arXiv Detail & Related papers (2021-03-19T17:56:33Z)
Bayesian Meta-Prior Learning Using Empirical Bayes [3.666114237131823]
We propose a hierarchical Empirical Bayes approach that addresses the absence of informative priors, and the inability to control parameter learning rates. Our method learns empirical meta-priors from the data itself and uses them to decouple the learning rates of first-order and second-order features. Our findings are promising, as optimizing over sparse data is often a challenge.
arXiv Detail & Related papers (2020-02-04T05:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.