Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
- URL: http://arxiv.org/abs/2406.04391v1
- Date: Thu, 6 Jun 2024 17:46:56 GMT
- Title: Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
- Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo,
- Abstract summary: We show that modeling scaling behavior on widely used multiple-choice question-answering benchmarks is challenging.
We show that downstream performance is computed from negative log likelihoods via a sequence of transformations that progressively degrade the statistical relationship between performance and scale.
We empirically study how probability mass on the correct choice co-varies with probability mass on incorrect choices with increasing compute, suggesting that scaling laws for incorrect choices might be achievable.
- Score: 26.04581530766348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many factors are certainly responsible, we identify a new factor that makes modeling scaling behavior on widely used multiple-choice question-answering benchmarks challenging. Using five model families and twelve well-established multiple-choice benchmarks, we show that downstream performance is computed from negative log likelihoods via a sequence of transformations that progressively degrade the statistical relationship between performance and scale. We then reveal the mechanism causing this degradation: downstream metrics require comparing the correct choice against a small number of specific incorrect choices, meaning accurately predicting downstream capabilities requires predicting not just how probability mass concentrates on the correct choice with scale, but also how probability mass fluctuates on specific incorrect choices with scale. We empirically study how probability mass on the correct choice co-varies with probability mass on incorrect choices with increasing compute, suggesting that scaling laws for incorrect choices might be achievable. Our work also explains why pretraining scaling laws are commonly regarded as more predictable than downstream capabilities and contributes towards establishing scaling-predictable evaluations of frontier AI models.
Related papers
- Awareness of uncertainty in classification using a multivariate model and multi-views [1.3048920509133808]
The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations.
Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions.
The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels.
arXiv Detail & Related papers (2024-04-16T06:40:51Z) - Selecting Large Language Model to Fine-tune via Rectified Scaling Law [74.84096546112215]
Given constrained resources, fine-tuning all models and making selections afterward is unrealistic.
We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase"
By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption.
arXiv Detail & Related papers (2024-02-04T01:55:00Z) - Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars.
Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well.
We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - Backward-Compatible Prediction Updates: A Probabilistic Approach [12.049279991559091]
We formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions.
In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies for backward-compatible prediction updates.
arXiv Detail & Related papers (2021-07-02T13:05:31Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks.
We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module.
Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z) - PrognoseNet: A Generative Probabilistic Framework for Multimodal
Position Prediction given Context Information [2.5302126831371226]
We propose an approach which reformulates the prediction problem as a classification task, allowing for powerful tools.
A smart choice of the latent variable allows for the reformulation of the log-likelihood function as a combination of a classification problem and a much simplified regression problem.
The proposed approach can easily incorporate context information and does not require any preprocessing of the data.
arXiv Detail & Related papers (2020-10-02T06:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.