R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility
Across Random User Intents
- URL: http://arxiv.org/abs/2303.00732v2
- Date: Fri, 28 Apr 2023 22:36:09 GMT
- Title: R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility
Across Random User Intents
- Authors: Daniel D. Johnson, Daniel Tarlow, Christian Walder
- Abstract summary: Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output.
We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE)
R-U-SURE is an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility.
- Score: 14.455036827804541
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models show impressive results at predicting structured text
such as code, but also commonly introduce errors and hallucinations in their
output. When used to assist software developers, these models may make mistakes
that users must go back and fix, or worse, introduce subtle bugs that users may
miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain
REgions (R-U-SURE), an approach for building uncertainty-aware suggestions
based on a decision-theoretic model of goal-conditioned utility, using random
samples from a generative model as a proxy for the unobserved possible intents
of the end user. Our technique combines minimum-Bayes-risk decoding, dual
decomposition, and decision diagrams in order to efficiently produce structured
uncertainty summaries, given only sample access to an arbitrary generative
model of code and an optional AST parser. We demonstrate R-U-SURE on three
developer-assistance tasks, and show that it can be applied different user
interaction patterns without retraining the model and leads to more accurate
uncertainty estimates than token-probability baselines. We also release our
implementation as an open-source library at
https://github.com/google-research/r_u_sure.
Related papers
- Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models [2.4065240342323384]
This paper introduces Efficient Adaptive Rejection Sampling (EARS)<n>EARS dynamically adjusts the acceptance threshold by incorporating the target model's own predictive uncertainty, measured as 1 - max(P_target)<n>It significantly enhances the efficiency of speculative decoding, achieving up to an 18.12% increase in throughput with a negligible 0.84% accuracy drop on the GSM8K benchmark.
arXiv Detail & Related papers (2025-12-15T11:08:56Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - Conv4Rec: A 1-by-1 Convolutional AutoEncoder for User Profiling through Joint Analysis of Implicit and Explicit Feedbacks [35.7275102787435]
We introduce a new convolutional AutoEncoder architecture for user modelling and recommendation tasks.<n>Our model is able to learn jointly from both the explicit ratings and the implicit information in the sampling pattern.<n>In experiments on several real-life datasets, we achieve state-of-the-art performance on both the implicit and explicit feedback prediction tasks.
arXiv Detail & Related papers (2025-09-09T08:25:11Z) - REFINESTAT: Efficient Exploration for Probabilistic Program Synthesis [5.509795962249259]
RefineStat enforces semantic constraints ensuring synthesized programs contain valid and well-formed parameters.<n>It applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail.<n>It produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models.
arXiv Detail & Related papers (2025-09-01T03:13:36Z) - Predictive Analytics for Collaborators Answers, Code Quality, and Dropout on Stack Overflow [5.4414562674321765]
Previous studies that used Stack Overflow to develop predictive models often employed limited benchmarks of 3-5 models or adopted arbitrary selection methods.<n>Our study evaluates 21 algorithms across three tasks: predicting the number of question a user is likely to answer, their code quality violations, and their dropout status.
arXiv Detail & Related papers (2025-06-23T06:23:12Z) - Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z) - Uncertainty-Aware Decoding with Minimum Bayes Risk [70.6645260214115]
We show how Minimum Bayes Risk decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method.
We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead.
arXiv Detail & Related papers (2025-03-07T10:55:12Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.
We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.
Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.
In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.
We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - A Gentle Introduction to Conformal Prediction and Distribution-Free
Uncertainty Quantification [1.90365714903665]
This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ.
We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax.
arXiv Detail & Related papers (2021-07-15T17:59:50Z) - Multi-output Gaussian Processes for Uncertainty-aware Recommender
Systems [3.908842679355254]
We introduce an efficient strategy for model training and inference, resulting in a model that scales to very large and sparse datasets.
Our model also provides meaningful uncertainty estimates about quantifying that prediction.
arXiv Detail & Related papers (2021-06-08T10:01:14Z) - VAE Approximation Error: ELBO and Conditional Independence [78.72292013299868]
This paper analyzes VAE approximation errors caused by the combination of the ELBO objective with the choice of the encoder probability family.
We show that the ELBO subset can not be enlarged, and the respective error cannot be decreased, by only considering deeper encoder networks.
arXiv Detail & Related papers (2021-02-18T12:54:42Z) - Bayes DistNet -- A Robust Neural Network for Algorithm Runtime
Distribution Predictions [1.8275108630751844]
Randomized algorithms are used in many state-of-the-art solvers for constraint satisfaction problems (CSP) and Boolean satisfiability (SAT) problems.
Previous state-of-the-art methods directly try to predict a fixed parametric distribution that the input instance follows.
This new model achieves robust predictive performance in the low observation setting, as well as handling censored observations.
arXiv Detail & Related papers (2020-12-14T01:15:39Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.