Related papers: Measuring Progress on Scalable Oversight for Large Language Models

Measuring Progress on Scalable Oversight for Large Language Models

URL: http://arxiv.org/abs/2211.03540v1
Date: Fri, 4 Nov 2022 17:03:49 GMT
Title: Measuring Progress on Scalable Oversight for Large Language Models
Authors: Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamile Lukosuite, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Liane Lovitt, Nelson Elhage, Nicholas Schiefer, Nicholas Joseph, Noem\'i Mercado, Nova DasSarma, Robin Larson, Sam McCandlish, Sandipan Kundu, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Jared Kaplan
Abstract summary: We present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We find that human participants who interact with an unreliable large-language-model dialog assistant through chat substantially outperform both the model alone and their own unaided performance.
Score: 19.705153174673576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on how to turn it into one that can be productively studied empirically. We first present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment following meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.

Related papers

Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots [13.26825865228582]
We propose eight uncertainty metrics and five quality metrics specifically designed for VLA models for robotic manipulation tasks.<n>We assess their effectiveness through a large-scale empirical study involving 908 successful task executions from three state-of-the-art VLA models.
arXiv Detail & Related papers (2025-07-22T22:15:59Z)
AGI-Elo: How Far Are We From Mastering A Task? [8.378767006620294]
This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains.<n>We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains.
arXiv Detail & Related papers (2025-05-19T08:30:13Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question. We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation. We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z)
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey [59.23394353614928]
In recent years, the rise of pre-trained models is driving the research on vision-language tasks. Inspired by the powerful capabilities of pre-trained models, new paradigms have emerged to solve the classic challenges.
arXiv Detail & Related papers (2024-12-11T07:29:04Z)
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models [64.1799100754406]
Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optimized training pipelines still remain inadequately explored in vision-language tasks. We present Insight-V, an early effort to 1) scalably produce long and robust reasoning data for complex multi-modal tasks, and 2) an effective training pipeline to enhance the reasoning capabilities of MLLMs.
arXiv Detail & Related papers (2024-11-21T18:59:55Z)
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z)
Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks. This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z)
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision [98.97575836717931]
Current AI alignment methodologies rely on human-provided demonstrations or judgments. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?
arXiv Detail & Related papers (2024-03-14T15:12:38Z)
On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion [12.855898113768998]
We study the case of UMLS vocabulary insertion, an important real-world task in which hundreds of thousands of new terms are added to the UMLS. We introduce a new formulation for UMLS vocabulary insertion which mirrors the real-world task. We also propose an effective rule-enhanced biomedical language model which enables important new model behavior.
arXiv Detail & Related papers (2023-11-25T19:35:53Z)
Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake? [62.59699229202307]
Despite advances in AI, it remains a significant challenge to develop interactive task guidance systems. We created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance.
arXiv Detail & Related papers (2023-11-01T15:13:49Z)
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models [5.975913042883176]
Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. We formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks.
arXiv Detail & Related papers (2022-12-21T04:43:19Z)
Human in the loop approaches in multi-modal conversational task guidance system development [6.493148232868973]
Development of task guidance systems for aiding humans in a situated task remains a challenging problem. We first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations.
arXiv Detail & Related papers (2022-11-03T14:05:30Z)
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.