Predictability and Surprise in Large Generative Models
- URL: http://arxiv.org/abs/2202.07785v2
- Date: Mon, 3 Oct 2022 21:00:42 GMT
- Title: Predictability and Surprise in Large Generative Models
- Authors: Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom
Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda
Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer
El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec,
Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei,
Tom Brown, Jared Kaplan, Sam McCandlish, Chris Olah, Jack Clark
- Abstract summary: Large-scale pre-training has emerged as a technique for creating capable, general purpose, generative models.
In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property.
- Score: 8.055204456718576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale pre-training has recently emerged as a technique for creating
capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG,
Gopher, and many others. In this paper, we highlight a counterintuitive
property of such models and discuss the policy implications of this property.
Namely, these generative models have an unusual combination of predictable loss
on a broad training distribution (as embodied in their "scaling laws"), and
unpredictable specific capabilities, inputs, and outputs. We believe that the
high-level predictability and appearance of useful capabilities drives rapid
development of such models, while the unpredictable qualities make it difficult
to anticipate the consequences of model deployment. We go through examples of
how this combination can lead to socially harmful behavior with examples from
the literature and real world observations, and we also perform two novel
experiments to illustrate our point about harms from unpredictability.
Furthermore, we analyze how these conflicting properties combine to give model
developers various motivations for deploying these models, and challenges that
can hinder deployment. We conclude with a list of possible interventions the AI
community may take to increase the chance of these models having a beneficial
impact. We intend this paper to be useful to policymakers who want to
understand and regulate AI systems, technologists who care about the potential
policy impact of their work, and academics who want to analyze, critique, and
potentially develop large generative models.
Related papers
- Sabotage Evaluations for Frontier Models [48.23262570766321]
Sufficiently capable models could subvert human oversight and decision-making in important contexts.
We develop a set of related threat models and evaluations.
We demonstrate these evaluations on Anthropic's Claude 3 Opus and Claude 3.5 Sonnet models.
arXiv Detail & Related papers (2024-10-28T20:34:51Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains.
In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z) - Conditioning Predictive Models: Risks and Strategies [1.3124513975412255]
We provide a definitive reference on what it would take to safely make use of generative/predictive models.
We believe that large language models can be understood as such predictive models of the world.
We think that conditioning approaches for predictive models represent the safest known way of eliciting human-level capabilities.
arXiv Detail & Related papers (2023-02-02T00:06:36Z) - ComplAI: Theory of A Unified Framework for Multi-factor Assessment of
Black-Box Supervised Machine Learning Models [6.279863832853343]
ComplAI is a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior.
It evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective.
arXiv Detail & Related papers (2022-12-30T08:48:19Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - On the Opportunities and Risks of Foundation Models [256.61956234436553]
We call these models foundation models to underscore their critically central yet incomplete character.
This report provides a thorough account of the opportunities and risks of foundation models.
To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration.
arXiv Detail & Related papers (2021-08-16T17:50:08Z) - When and How to Fool Explainable Models (and Humans) with Adversarial
Examples [1.439518478021091]
We explore the possibilities and limits of adversarial attacks for explainable machine learning models.
First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios.
Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
arXiv Detail & Related papers (2021-07-05T11:20:55Z) - Thief, Beware of What Get You There: Towards Understanding Model
Extraction Attack [13.28881502612207]
In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient in-distribution data is publicly available.
We find the effectiveness of existing techniques significantly affected by the absence of pre-trained models.
We formulate model extraction attacks into an adaptive framework that captures these factors with deep reinforcement learning.
arXiv Detail & Related papers (2021-04-13T03:46:59Z) - Robustness of Model Predictions under Extension [3.766702945560518]
A caveat to using models for analysis is that predicted causal effects and conditional independences may not be robust under model extensions.
We show how to use the technique of causal ordering to efficiently assess the robustness of qualitative model predictions.
For dynamical systems at equilibrium, we demonstrate how novel insights help to select appropriate model extensions.
arXiv Detail & Related papers (2020-12-08T20:21:03Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.