Related papers: Learning in an Echo Chamber: Online Learning with Replay Adversary

Learning in an Echo Chamber: Online Learning with Replay Adversary

URL: http://arxiv.org/abs/2509.25135v1
Date: Mon, 29 Sep 2025 17:50:24 GMT
Title: Learning in an Echo Chamber: Online Learning with Replay Adversary
Authors: Daniil Dmitriev, Harald Eskelund Franck, Carolin Heinzler, Amartya Sanyal,
Abstract summary: We introduce a learning-theoretic framework: Online Learning in the Replay Setting.<n>A closure-based learner makes at most $mathrmExThD(mathcalH)$ mistakes against any adaptive adversary.<n>For adversaries, we prove a similar bound for every intersection-closed class.
Score: 13.758101623622105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning systems increasingly train on self-annotated data, they risk reinforcing errors and becoming echo chambers of their own beliefs. We model this phenomenon by introducing a learning-theoretic framework: Online Learning in the Replay Setting. In round $t$, the learner outputs a hypothesis $\hat{h}_t$; the adversary then reveals either the true label $f^\ast(x_t)$ or a replayed label $\hat{h}_i(x_t)$ from an earlier round $i < t$. A mistake is counted only when the true label is shown, yet classical algorithms such as the SOA or the halving algorithm are easily misled by the replayed errors. We introduce the Extended Threshold dimension, $\mathrm{ExThD}(\mathcal{H})$, and prove matching upper and lower bounds that make $\mathrm{ExThD}(\mathcal{H})$ the exact measure of learnability in this model. A closure-based learner makes at most $\mathrm{ExThD}(\mathcal{H})$ mistakes against any adaptive adversary, and no algorithm can perform better. For stochastic adversaries, we prove a similar bound for every intersection-closed class. The replay setting is provably harder than the classical mistake bound setting: some classes have constant Littlestone dimension but arbitrarily large $\mathrm{ExThD}(\mathcal{H})$. Proper learning exhibits an even sharper separation: a class is properly learnable under replay if and only if it is (almost) intersection-closed. Otherwise, every proper learner suffers $\Omega(T)$ errors, whereas our improper algorithm still achieves the $\mathrm{ExThD}(\mathcal{H})$ bound. These results give the first tight analysis of learning against replay adversaries, based on new results for closure-type algorithms.

Related papers

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games [1.9161424760743742]
We propose an online model selection algorithm for the average reward RL setting. We show that the additional cost of model selection scales only linearly in $M$, the number of model classes. We also show that the exponential dependency on $m*$ is inevitable by proving a lower bound on the learner's regret.
arXiv Detail & Related papers (2024-11-09T05:03:10Z)
Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise. We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z)
Simple online learning with consistent oracle [55.43220407902113]
We consider online learning in the model where a learning algorithm can access the class only via the emphconsistent oracle -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far.
arXiv Detail & Related papers (2023-08-15T21:50:40Z)
On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary. Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z)
Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise. We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games. This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z)
Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class. This result answers a question as to whether proper learners could achieve near-optimal mistake bounds. For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z)
Littlestone Classes are Privately Online Learnable [28.04975353867202]
We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 leq t leq T$, and returns at each iteration a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $mathcalH$.
arXiv Detail & Related papers (2021-06-25T09:08:33Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.