Related papers: Models That Prove Their Own Correctness

Models That Prove Their Own Correctness

URL: http://arxiv.org/abs/2405.15722v2
Date: Fri, 7 Jun 2024 21:00:05 GMT
Title: Models That Prove Their Own Correctness
Authors: Noga Amit, Shafi Goldwasser, Orr Paradise, Guy Rothblum,
Abstract summary: We train Self-Proving models that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. With high probability over a random input, the model generates a correct output *and* successfully proves its correctness to $V!$. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.
Score: 2.6570606951261015
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output *and* successfully proves its correctness to $V\!$. The *soundness* property of $V$ guarantees that, for *every* input, no model can convince $V$ of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by $V$. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.

Related papers

Probing for Arithmetic Errors in Language Models [86.8227317662622]
Internal activations in language models can be used to detect arithmetic errors.<n>We show that simple probes can accurately decode both the model's predicted output and the correct answer from hidden states.<n>We train lightweight error detectors that predict model correctness with over 90% accuracy.
arXiv Detail & Related papers (2025-07-16T16:27:50Z)
Prime Convolutional Model: Breaking the Ground for Theoretical Explainability [45.07003937279752]
We propose a new theoretical approach to Explainable AI. We apply the method to a case study created in a controlled environment. We show that the different behaviors of p-Conv can be modeled mathematically in terms of $m$ and $B$.
arXiv Detail & Related papers (2025-03-04T16:42:46Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two algorithms that enjoy provable scaling laws for the test-time compute of large language models.<n>One is a two-stage knockout-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.<n>The other is a two-stage league-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup. We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z)
Score-based generative models are provably robust: an uncertainty quantification perspective [4.396860522241307]
We show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem. We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization, and (e) reference distribution choice, impact the quality of the generative model.
arXiv Detail & Related papers (2024-05-24T17:50:17Z)
Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs [35.92045337126979]
We propose a strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between $widehatp_theta(Y|X)$ and $p(Y|X)$. We demonstrate that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks.
arXiv Detail & Related papers (2024-02-13T19:01:45Z)
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing [61.91898698128994]
We introduce a unified and simple framework for probing and measuring uncertainty about concepts represented by models. Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
arXiv Detail & Related papers (2023-05-29T17:00:16Z)
Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs. We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models. We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z)
Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$. We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
"Adversarial Examples" for Proof-of-Learning [32.438181794551035]
Jia et al. proposed a new concept/mechanism named proof-of-learning (PoL) PoL allows a prover to demonstrate ownership of a machine learning model by proving integrity of the training procedure. We show that PoL is vulnerable to "adrialversa examples"
arXiv Detail & Related papers (2021-08-21T07:56:29Z)
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
Estimating Stochastic Linear Combination of Non-linear Regressions Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much. To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z)
PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs. Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm. We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
Query Training: Learning a Worse Model to Infer Better Marginals in Undirected Graphical Models with Hidden Variables [11.985433487639403]
Probabilistic graphical models (PGMs) provide a compact representation of knowledge that can be queried in a flexible way. We introduce query training (QT), a mechanism to learn a PGM that is optimized for the approximate inference algorithm that will be paired with it. We demonstrate experimentally that QT can be used to learn a challenging 8-connected grid Markov random field with hidden variables.
arXiv Detail & Related papers (2020-06-11T20:34:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.