Related papers: Weak-to-Strong Reasoning

Weak-to-Strong Reasoning

URL: http://arxiv.org/abs/2407.13647v2
Date: Tue, 1 Oct 2024 05:28:54 GMT
Title: Weak-to-Strong Reasoning
Authors: Yuqing Yang, Yan Ma, Pengfei Liu,
Abstract summary: We introduce a progressive learning framework that enables the strong model to autonomously refine its training data. Our method significantly enhances the reasoning capabilities of Llama2-70b using three separate weak models. This work paves the way for a more scalable and sophisticated strategy to enhance AI reasoning powers.
Score: 33.20094938292376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervision for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities of a stronger model, proves valuable in this context. Yet, the efficacy of this approach for complex reasoning tasks is still untested. Furthermore, tackling reasoning tasks under the weak-to-strong setting currently lacks efficient methods to avoid blindly imitating the weak supervisor including its errors. In this paper, we introduce a progressive learning framework that enables the strong model to autonomously refine its training data, without requiring input from either a more advanced model or human-annotated data. This framework begins with supervised fine-tuning on a selective small but high-quality dataset, followed by preference optimization on contrastive samples identified by the strong model itself. Extensive experiments on the GSM8K and MATH datasets demonstrate that our method significantly enhances the reasoning capabilities of Llama2-70b using three separate weak models. This method is further validated in a forward-looking experimental setup, where Llama3-8b-instruct effectively supervises Llama3-70b on the highly challenging OlympicArena dataset. This work paves the way for a more scalable and sophisticated strategy to enhance AI reasoning powers. All relevant code and resources are available in \url{https://github.com/GAIR-NLP/weak-to-strong-reasoning}.

Related papers

Exploring Weak-to-Strong Generalization for CLIP-based Classification [22.367331851262875]
Current methods rely on human supervision but become impractical as model complexity increases.<n>A novel solution proposed recently is using a weaker model to supervise a stronger model.<n>Previous work has shown the effectiveness of weak-to-strong generalization in the context of language-only models.<n>We propose a method, class prototype learning (CPL), which aims to enhance the classification capabilities of the CLIP model.
arXiv Detail & Related papers (2025-11-23T10:47:25Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning [29.65600202138321]
In high-speed data stream environments, data do not pause to accommodate slow models. Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time. Model's myopia: the local learning nature of OCL leads the model to adopt overly simplified, task-specific features.
arXiv Detail & Related papers (2024-09-28T05:24:56Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset. Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem. Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z)
Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking [16.109330335379962]
Dyna-bAbI is a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization.
arXiv Detail & Related papers (2021-11-30T20:36:56Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness. By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.