Related papers: Refereed Learning

Refereed Learning

URL: http://arxiv.org/abs/2510.05440v1
Date: Mon, 06 Oct 2025 23:07:31 GMT
Title: Refereed Learning
Authors: Ran Canetti, Ephraim Linder, Connor Wagaman,
Abstract summary: We show refereed learning protocols that obtain a level of accuracy far exceeds what is obtainable at comparable cost without provers.<n>For all $varepsilon>0$ and ambient dimension $d$, our learner makes only one query to the ground truth function.<n>We develop a technique that allows the learner to sample, using the provers, from a distribution that is not efficiently samplable to begin with.
Score: 4.024083217094761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We initiate an investigation of learning tasks in a setting where the learner is given access to two competing provers, only one of which is honest. Specifically, we consider the power of such learners in assessing purported properties of opaque models. Following prior work that considers the power of competing provers in different settings, we call this setting refereed learning. After formulating a general definition of refereed learning tasks, we show refereed learning protocols that obtain a level of accuracy that far exceeds what is obtainable at comparable cost without provers, or even with a single prover. We concentrate on the task of choosing the better one out of two black-box models, with respect to some ground truth. While we consider a range of parameters, perhaps our most notable result is in the high-precision range: For all $\varepsilon>0$ and ambient dimension $d$, our learner makes only one query to the ground truth function, communicates only $(1+\frac{1}{\varepsilon^2})\cdot\text{poly}(d)$ bits with the provers, and outputs a model whose loss is within a multiplicative factor of $(1+\varepsilon)$ of the best model's loss. Obtaining comparable loss with a single prover would require the learner to access the ground truth at almost all of the points in the domain. To obtain this bound, we develop a technique that allows the learner to sample, using the provers, from a distribution that is not efficiently samplable to begin with. We find this technique to be of independent interest. We also present lower bounds that demonstrate the optimality of our protocols in a number of respects, including prover complexity, number of samples, and need for query access.

Related papers

Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension [17.485243410774814]
In traditional models of supervised learning, the goal of a learner is to output a hypothesis that is competitive (to within $epsilon$) of the best fitting concept from some class.<n>We introduce a smoothed-analysis framework that requires a learner to compete only with the best agnostic.<n>We obtain the first algorithm forally learning intersections of $k$-halfspaces in time.
arXiv Detail & Related papers (2024-07-01T04:58:36Z)
Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift. We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z)
Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z)
Sample Complexity of Robust Learning against Evasion Attacks [3.8888996044605855]
We study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity. We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable. We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk algorithms in the distribution-free setting.
arXiv Detail & Related papers (2023-08-23T10:51:33Z)
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable. We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z)
There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning for Mazes [64.05903267230467]
Interpretability is an essential building block for trustworthiness in reinforcement learning systems. We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
arXiv Detail & Related papers (2022-06-09T04:23:26Z)
Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks [25.832511407411637]
A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks. We work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability. For every fixed $k$ the class of $k$-decision lists has sample complexity against a $log(n)$-bounded adversary.
arXiv Detail & Related papers (2022-05-12T14:40:18Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action. We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z)
Active Learning for Contextual Search with Binary Feedbacks [2.6424064030995957]
We study the learning problem in contextual search motivated by applications such as first-price auction. We propose a tri-section search approach combined with a margin-based active learning method.
arXiv Detail & Related papers (2021-10-03T19:05:29Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
A Provably Efficient Sample Collection Strategy for Reinforcement Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.