Related papers: Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning

Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning

URL: http://arxiv.org/abs/2411.00973v1
Date: Fri, 01 Nov 2024 18:55:31 GMT
Title: Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning
Authors: Simon Rampp, Manuel Milling, Andreas Triantafyllopoulos, Björn W. Schuller,
Abstract summary: Curriculum learning (CL) describes a machine learning training strategy in which samples are gradually introduced into the training process based on their difficulty. We study the robustness and similarity of the most common scoring functions for sample difficulty estimation. We find that the robustness of scoring functions across random seeds positively correlates with CL performance.
Score: 42.4526628515253
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Curriculum learning (CL) describes a machine learning training strategy in which samples are gradually introduced into the training process based on their difficulty. Despite a partially contradictory body of evidence in the literature, CL finds popularity in deep learning research due to its promise of leveraging human-inspired curricula to achieve higher model performance. Yet, the subjectivity and biases that follow any necessary definition of difficulty, especially for those found in orderings derived from models or training statistics, have rarely been investigated. To shed more light on the underlying unanswered questions, we conduct an extensive study on the robustness and similarity of the most common scoring functions for sample difficulty estimation, as well as their potential benefits in CL, using the popular benchmark dataset CIFAR-10 and the acoustic scene classification task from the DCASE2020 challenge as representatives of computer vision and computer audition, respectively. We report a strong dependence of scoring functions on the training setting, including randomness, which can partly be mitigated through ensemble scoring. While we do not find a general advantage of CL over uniform sampling, we observe that the ordering in which data is presented for CL-based training plays an important role in model performance. Furthermore, we find that the robustness of scoring functions across random seeds positively correlates with CL performance. Finally, we uncover that models trained with different CL strategies complement each other by boosting predictive power through late fusion, likely due to differences in the learnt concepts. Alongside our findings, we release the aucurriculum toolkit (https://github.com/autrainer/aucurriculum), implementing sample difficulty and CL-based training in a modular fashion.

Related papers

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models [22.032582616029707]
We describe CL on dynamic benchmarks (CLDyB), a general computational framework for evaluating CL methods reliably. We first conduct a joint evaluation of multiple state-of-the-art CL methods, leading to a set of commonly challenging and generalizable task sequences. We then conduct separate evaluations of individual CL methods using CLDyB, discovering their respective strengths and weaknesses.
arXiv Detail & Related papers (2025-03-06T17:49:13Z)
A Psychology-based Unified Dynamic Framework for Curriculum Learning [5.410910735259908]
This paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF) We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC) We propose a Dynamic Data Selection via Model Ability Estimation (DDS-MAE) strategy to schedule the appropriate amount of data during model training.
arXiv Detail & Related papers (2024-08-09T20:30:37Z)
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL) We take the first step by examining the pre-training dynamics of the emergence of ICL. We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z)
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets. We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning. The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z)
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning [9.104068727716294]
Continual learning (CL) is one of the most promising trends in machine learning research. We introduce two novel CL benchmarks that involve multiple heterogeneous tasks from six image datasets. We additionally structure our benchmarks so that tasks are presented in increasing and decreasing order of complexity.
arXiv Detail & Related papers (2023-03-16T18:11:19Z)
A Mathematical Model for Curriculum Learning for Parities [8.522887729678637]
We introduce a CL model for learning the class of k-parities on d bits of a binary string with a neural network trained by gradient descent. We show that a wise choice of training examples involving two or more product distributions, allows to reduce significantly the computational cost of learning this class of functions. We also show that for another class of functions - namely the Hamming mixtures' - CL strategies involving a bounded number of product distributions are not beneficial.
arXiv Detail & Related papers (2023-01-31T18:25:36Z)
Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU [19.42920238320109]
Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend. In this work, we employ CL for Natural Language Understanding (NLU) tasks by taking advantage of training dynamics as difficulty metrics. Experiments indicate that training dynamics can lead to better performing models with smoother training compared to other difficulty metrics.
arXiv Detail & Related papers (2022-10-22T17:10:04Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.