Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
- URL: http://arxiv.org/abs/2403.09472v1
- Date: Thu, 14 Mar 2024 15:12:38 GMT
- Title: Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
- Authors: Zhiqing Sun, Longhui Yu, Yikang Shen, Weiyang Liu, Yiming Yang, Sean Welleck, Chuang Gan,
- Abstract summary: Current AI alignment methodologies rely on human-provided demonstrations or judgments.
This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?
- Score: 98.97575836717931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans? This paper answers this question in the context of tackling hard reasoning tasks (e.g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e.g., level 1-3 MATH problems), which we term as \textit{easy-to-hard generalization}. Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. Based on this insight, we propose a novel approach to scalable alignment, which firstly trains the process-supervised reward models on easy problems (e.g., level 1-3), and then uses them to evaluate the performance of policy models on hard problems. We show that such \textit{easy-to-hard generalization from evaluators} can enable \textit{easy-to-hard generalizations in generators} either through re-ranking or reinforcement learning (RL). Notably, our process-supervised 7b RL model achieves an accuracy of 34.0\% on MATH500, despite only using human supervision on easy problems. Our approach suggests a promising path toward AI systems that advance beyond the frontier of human supervision.
Related papers
- Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? [74.88417042125985]
We investigate various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity.
We find that even when the outcome error rate for hard task supervision is high, training on such data can outperform perfectly correct supervision on easier subtasks.
Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements.
arXiv Detail & Related papers (2024-10-27T17:55:27Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Learning to Guide Multiple Heterogeneous Actors from a Single Human
Demonstration via Automatic Curriculum Learning in StarCraft II [0.5911087507716211]
In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors.
Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines.
arXiv Detail & Related papers (2022-05-11T21:53:11Z) - Divide & Conquer Imitation Learning [75.31752559017978]
Imitation Learning can be a powerful approach to bootstrap the learning process.
We present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory.
We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
arXiv Detail & Related papers (2022-04-15T09:56:50Z) - A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning
Instances [30.32386551923329]
We present a curriculum-driven learning approach that is designed to solve a single hard instance.
We show how the smoothness of the task hardness impacts the final learning results.
Our approach can uncover plans that are far out of reach for any previous state-of-the-art Sokoban solver.
arXiv Detail & Related papers (2021-10-03T00:44:50Z) - Leveraging Rationales to Improve Human Task Performance [15.785125079811902]
Given a computational system's performance exceeds that of its human user, can explainable AI capabilities be leveraged to improve the performance of the human?
We introduce the Rationale-Generating Algorithm, an automated technique for generating rationales for utility-based computational methods.
Results show that our approach produces rationales that lead to statistically significant improvement in human task performance.
arXiv Detail & Related papers (2020-02-11T04:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.