Conditioning Predictive Models: Risks and Strategies
- URL: http://arxiv.org/abs/2302.00805v2
- Date: Mon, 6 Feb 2023 10:18:47 GMT
- Title: Conditioning Predictive Models: Risks and Strategies
- Authors: Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, Kate
Woolverton
- Abstract summary: We provide a definitive reference on what it would take to safely make use of generative/predictive models.
We believe that large language models can be understood as such predictive models of the world.
We think that conditioning approaches for predictive models represent the safest known way of eliciting human-level capabilities.
- Score: 1.3124513975412255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our intention is to provide a definitive reference on what it would take to
safely make use of generative/predictive models in the absence of a solution to
the Eliciting Latent Knowledge problem. Furthermore, we believe that large
language models can be understood as such predictive models of the world, and
that such a conceptualization raises significant opportunities for their safe
yet powerful use via carefully conditioning them to predict desirable outputs.
Unfortunately, such approaches also raise a variety of potentially fatal safety
problems, particularly surrounding situations where predictive models predict
the output of other AI systems, potentially unbeknownst to us. There are
numerous potential solutions to such problems, however, primarily via carefully
conditioning models to predict the things we want (e.g. humans) rather than the
things we don't (e.g. malign AIs). Furthermore, due to the simplicity of the
prediction objective, we believe that predictive models present the easiest
inner alignment problem that we are aware of. As a result, we think that
conditioning approaches for predictive models represent the safest known way of
eliciting human-level and slightly superhuman capabilities from large language
models and other similar future models.
Related papers
- Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars.
Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well.
We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z) - Using Models Based on Cognitive Theory to Predict Human Behavior in
Traffic: A Case Study [4.705182901389292]
We investigate the usefulness of a novel cognitively plausible model for predicting human behavior in gap acceptance scenarios.
We show that this model can compete with or even outperform well-established data-driven prediction models.
arXiv Detail & Related papers (2023-05-24T14:27:00Z) - A roadmap to fair and trustworthy prediction model validation in
healthcare [2.476158303361112]
A prediction model is most useful if it generalizes beyond the development data.
We propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
arXiv Detail & Related papers (2023-04-07T04:24:19Z) - Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Predictability and Surprise in Large Generative Models [8.055204456718576]
Large-scale pre-training has emerged as a technique for creating capable, general purpose, generative models.
In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property.
arXiv Detail & Related papers (2022-02-15T23:21:23Z) - Beyond Average Performance -- exploring regions of deviating performance
for black box classification models [0.0]
We describe two approaches that can be used to provide interpretable descriptions of the expected performance of any black box classification model.
These approaches are of high practical relevance as they provide means to uncover and describe in an interpretable way situations where the models are expected to have a performance that deviates significantly from their average behaviour.
arXiv Detail & Related papers (2021-09-16T20:46:52Z) - Probabilistic Human Motion Prediction via A Bayesian Neural Network [71.16277790708529]
We propose a probabilistic model for human motion prediction in this paper.
Our model could generate several future motions when given an observed motion sequence.
We extensively validate our approach on a large scale benchmark dataset Human3.6m.
arXiv Detail & Related papers (2021-07-14T09:05:33Z) - Robustness of Model Predictions under Extension [3.766702945560518]
A caveat to using models for analysis is that predicted causal effects and conditional independences may not be robust under model extensions.
We show how to use the technique of causal ordering to efficiently assess the robustness of qualitative model predictions.
For dynamical systems at equilibrium, we demonstrate how novel insights help to select appropriate model extensions.
arXiv Detail & Related papers (2020-12-08T20:21:03Z) - Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.