Learning to Give Checkable Answers with Prover-Verifier Games
- URL: http://arxiv.org/abs/2108.12099v1
- Date: Fri, 27 Aug 2021 02:56:06 GMT
- Title: Learning to Give Checkable Answers with Prover-Verifier Games
- Authors: Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse
- Abstract summary: We introduce Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner.
We analyze variants of the framework, including simultaneous and sequential games, and narrow the space down to a subset of games which provably have the desired equilibria.
We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.
- Score: 23.93694563816463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our ability to know when to trust the decisions made by machine learning
systems has not kept up with the staggering improvements in their performance,
limiting their applicability in high-stakes domains. We introduce
Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning
agents to solve decision problems in a verifiable manner. The PVG consists of
two learners with competing objectives: a trusted verifier network tries to
choose the correct answer, and a more powerful but untrusted prover network
attempts to persuade the verifier of a particular answer, regardless of its
correctness. The goal is for a reliable justification protocol to emerge from
this game. We analyze variants of the framework, including simultaneous and
sequential games, and narrow the space down to a subset of games which provably
have the desired equilibria. We develop instantiations of the PVG for two
algorithmic tasks, and show that in practice, the verifier learns a robust
decision rule that is able to receive useful and reliable information from an
untrusted prover. Importantly, the protocol still works even when the verifier
is frozen and the prover's messages are directly optimized to convince the
verifier.
Related papers
- ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification.
Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z) - Rationale-Aware Answer Verification by Pairwise Self-Evaluation [11.763229353978321]
We show that training reliable verifiers requires ensuring the validity of rationales in addition to the correctness of the final answers.
Our results suggest that training reliable verifiers requires ensuring the validity of rationales in addition to the correctness of the final answers.
arXiv Detail & Related papers (2024-10-07T08:53:00Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - GRATR: Zero-Shot Evidence Graph Retrieval-Augmented Trustworthiness Reasoning [7.3795957796342195]
Trustworthiness reasoning aims to enable agents in multiplayer games with incomplete information to identify potential allies and adversaries.
We introduce the graph retrieval-augmented trustworthiness reasoning (GRATR) framework, which retrieves observable evidence from the game environment.
arXiv Detail & Related papers (2024-08-22T12:21:22Z) - Automated Security Response through Online Learning with Adaptive Conjectures [13.33996350474556]
We study automated security response for an IT infrastructure.
We formulate the interaction between an attacker and a defender as a partially observed, non-stationary game.
arXiv Detail & Related papers (2024-02-19T20:06:15Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - On the Importance of Trust in Next-Generation Networked CPS Systems: An
AI Perspective [2.1055643409860734]
We propose trust as a measure to evaluate the status of network agents and improve the decision-making process.
Trust relations are based on evidence created by the interactions of entities within a protocol.
We show how utilizing the trust evidence can improve the performance and the security of Federated Learning.
arXiv Detail & Related papers (2021-04-16T02:12:13Z) - Robust Vision-Based Cheat Detection in Competitive Gaming [12.124621973070164]
We propose a vision-based approach that captures the final state of the frame buffer and detects illicit overlays.
Our results show that robust and effective anti-cheating through machine learning is practically feasible.
arXiv Detail & Related papers (2021-03-18T06:06:52Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - Learning to Communicate and Correct Pose Errors [75.03747122616605]
We study the setting proposed in V2VNet, where nearby self-driving vehicles jointly perform object detection and motion forecasting in a cooperative manner.
We propose a novel neural reasoning framework that learns to communicate, to estimate potential errors, and to reach a consensus about those errors.
arXiv Detail & Related papers (2020-11-10T18:19:40Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.