DAST: Difficulty-Aware Self-Training on Large Language Models
- URL: http://arxiv.org/abs/2503.09029v1
- Date: Wed, 12 Mar 2025 03:36:45 GMT
- Title: DAST: Difficulty-Aware Self-Training on Large Language Models
- Authors: Boyang Xue, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Hongling Xu, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong,
- Abstract summary: Large Language Models (LLM) self-training methods always under-sample on challenging queries.<n>This work proposes a difficulty-aware self-training framework that focuses on improving the quantity and quality of self-generated responses.
- Score: 68.30467836807362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.
Related papers
- Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z) - Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs [45.83245433138508]
Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks.<n>They apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones.<n>This survey presents a comprehensive review of efficient test-time compute strategies, which aim to improve the computational efficiency of LLM reasoning.
arXiv Detail & Related papers (2025-07-02T18:27:42Z) - e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs [49.01449646799905]
We show that most existing reasoning models do not extrapolate well.<n>Our recipe e3 produces the best known 1.7B model according to AIME'25 and HMMT'25 scores.<n>Our e3-1.7B model not only attains high pass@1 scores, but also improves pass@k over the base model.
arXiv Detail & Related papers (2025-06-10T17:52:42Z) - Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [52.32193550674408]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z) - Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective [27.94738910330893]
Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models.<n>Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties.<n>This paper introduces $textbfC$ompetence-$textbfD$ifficulty, which enables accurate and stable estimation of problem difficulties.
arXiv Detail & Related papers (2025-05-23T09:15:26Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning [26.680686158061192]
Reasoning is a fundamental capability of large language models (LLMs)<n>This paper introduces TextGames, a benchmark specifically crafted to assess LLMs through demanding text-based games.<n>Our findings reveal that although LLMs exhibit proficiency in addressing most easy and medium-level problems, they face significant challenges with more difficult tasks.
arXiv Detail & Related papers (2025-02-25T18:26:48Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling [38.7578639980701]
Self-improvement methods enable large language models to generate solutions themselves.<n>We find that models tend to over-sample on easy queries and under-sample on queries they have yet to master.<n>We introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data.
arXiv Detail & Related papers (2024-11-01T17:18:45Z) - Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models [8.255272009912417]
We propose a novel prompt approach for automatic reasoning named textbfLBS3, inspired by curriculum learning.<n>LBS3 steers LLMs to recall easy-to-hard proxy queries that are pertinent to the target query.<n>It invokes a progressive strategy that utilizes exemplary prompts stemmed from easy-proxy queries to direct LLMs in solving hard-proxy queries.
arXiv Detail & Related papers (2024-10-29T04:28:49Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning [12.651588927599441]
Instruction tuning aims to align large language models with open-domain instructions and human-preferred responses.
We introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR) to select instructions that are difficult for a student LLM to follow.
To balance the student's capabilities, task distributions in training sets are adjusted with responses automatically refined according to their corresponding tasks.
arXiv Detail & Related papers (2024-05-22T08:38:26Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.