Related papers: Four Quadrants of Difficulty: A Simple Categorisation and its Limits

Four Quadrants of Difficulty: A Simple Categorisation and its Limits

URL: http://arxiv.org/abs/2601.01488v1
Date: Sun, 04 Jan 2026 11:31:51 GMT
Title: Four Quadrants of Difficulty: A Simple Categorisation and its Limits
Authors: Vanessa Toborek, Sebastian Müller, Christian Bauckhage,
Abstract summary: We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent.<n>We find that task-agnostic features behave largely independently and that only task-dependent features align.<n>These findings challenge common Curriculum Learning intuitions and highlight the need for lightweight, task-dependent difficulty estimators.
Score: 4.304007567113229
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of samples and scheduling them accordingly. In NLP, difficulty is commonly approximated using task-agnostic linguistic heuristics or human intuition, implicitly assuming that these signals correlate with what neural models find difficult to learn. We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent -- and systematically analyse their interactions on a natural language understanding dataset. We find that task-agnostic features behave largely independently and that only task-dependent features align. These findings challenge common CL intuitions and highlight the need for lightweight, task-dependent difficulty estimators that better reflect model learning behaviour.

Related papers

LLMs Encode How Difficult Problems Are [4.990590622073335]
We investigate whether large language models encode problem difficulty in a way that aligns with human judgment.<n>We train linear probes across layers and token positions on 60 models, evaluating on mathematical and coding subsets of Easy2HardBench.
arXiv Detail & Related papers (2025-10-20T22:48:23Z)
CHUCKLE -- When Humans Teach AI To Learn Emotions The Easy Way [11.645594774207511]
We propose CHUCKLE (Crowdsourced Human Understanding Curriculum for Knowledge Led Emotion Recognition), a perception-driven CL framework for emotion recognition.<n>We show that CHUCKLE increases the relative mean accuracy by 6.56% for LSTMs and 1.61% for Transformers over non-curriculum baselines, while reducing the number of gradient updates.
arXiv Detail & Related papers (2025-10-10T13:38:06Z)
Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z)
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? [59.418994222096885]
We conduct a detailed analysis of model performance on the AIME24 dataset.<n>We categorize questions into four tiers (Easy, Medium, Hard, and Extremely Hard)<n>We find that progression from Easy to Medium tier requires adopting an R1 reasoning style with minimal SFT-1K instances.<n>Exh-level questions present a fundamentally different challenge; they require unconventional problem-solving skills.
arXiv Detail & Related papers (2025-04-16T03:39:38Z)
DAST: Difficulty-Aware Self-Training on Large Language Models [68.30467836807362]
Large Language Models (LLM) self-training methods always under-sample on challenging queries.<n>This work proposes a difficulty-aware self-training framework that focuses on improving the quantity and quality of self-generated responses.
arXiv Detail & Related papers (2025-03-12T03:36:45Z)
Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty [2.335292678914151]
This study investigates the effectiveness of Large Language Models (LLMs) in estimating the difficulty of reading comprehension questions.<n>We use OpenAI's GPT-4o and o1, in estimating the difficulty of reading comprehension questions using the Study Aid and Reading Assessment (SARA) dataset.<n>The results indicate that while the models yield difficulty estimates that align meaningfully with derived IRT parameters, there are notable differences in their sensitivity to extreme item characteristics.
arXiv Detail & Related papers (2025-02-25T02:28:48Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features. We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.