Related papers: Towards More Fine-grained and Reliable NLP Performance Prediction

Towards More Fine-grained and Reliable NLP Performance Prediction

URL: http://arxiv.org/abs/2102.05486v1
Date: Wed, 10 Feb 2021 15:23:20 GMT
Title: Towards More Fine-grained and Reliable NLP Performance Prediction
Authors: Zihuiwen Ye, Pengfei Liu, Jinlan Fu, Graham Neubig
Abstract summary: We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
Score: 85.78131503006193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance prediction, the task of estimating a system's performance without performing experiments, allows us to reduce the experimental burden caused by the combinatorial explosion of different datasets, languages, tasks, and models. In this paper, we make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors not only for holistic measures of accuracy like F1 or BLEU but also fine-grained performance measures such as accuracy over individual classes of examples. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration. We perform an analysis of four types of NLP tasks, and both demonstrate the feasibility of fine-grained performance prediction and the necessity to perform reliability analysis for performance prediction methods in the future. We make our code publicly available: \url{https://github.com/neulab/Reliable-NLPPP}

Related papers

Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection [16.838728310658105]
We propose a novel early exiting method based on the Certainty-Aware Probability (CAP) score.<n>We show that our method can achieve an average speed-up ratio of 2.19x across all tasks with negligible performance degradation.
arXiv Detail & Related papers (2025-06-08T05:08:34Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
Can We Predict Performance of Large Models across Vision-Language Tasks? [34.27319941609499]
We propose a new framework for predicting unknown performance scores based on observed ones from other LVLMs or tasks. We use a sparse performance matrix $boldsymbolR$, where each entry $R_mn$ represents the performance score of the $m$-th model on the $n$-th dataset. We demonstrate the accuracy of PMF in predicting unknown scores, the reliability of uncertainty estimates in ordering evaluations, and the effectiveness of our enhancements for handling sparse data.
arXiv Detail & Related papers (2024-10-14T03:00:12Z)
Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation. We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources. We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z)
Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks. Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z)
Certified Human Trajectory Prediction [66.1736456453465]
We propose a certification approach tailored for trajectory prediction that provides guaranteed robustness.<n>To mitigate the inherent performance drop through certification, we propose a diffusion-based trajectory denoiser and integrate it into our method.<n>We demonstrate the accuracy and robustness of the certified predictors and highlight their advantages over the non-certified ones.
arXiv Detail & Related papers (2024-03-20T17:41:35Z)
Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks [12.607426130997336]
We propose a Bayesian deep learning based method, namely BDLPerf, that can incorporate uncertainty into the prediction model. We develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model. Our experimental results on 10 real-world systems show that BDLPerf achieves higher accuracy than existing approaches.
arXiv Detail & Related papers (2022-12-27T04:39:26Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
Learning Prediction Intervals for Model Performance [1.433758865948252]
We propose a method to compute prediction intervals for model performance. We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
arXiv Detail & Related papers (2020-12-15T21:32:03Z)
Towards Improving Selective Prediction Ability of NLP Systems [24.774450633678125]
We propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings.
arXiv Detail & Related papers (2020-08-21T08:46:36Z)
Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.