Related papers: Game of Coding With an Unknown Adversary

Game of Coding With an Unknown Adversary

URL: http://arxiv.org/abs/2502.07109v1
Date: Mon, 10 Feb 2025 23:06:10 GMT
Title: Game of Coding With an Unknown Adversary
Authors: Hanzaleh Akbarinodehi, Parsa Moradi, Mohammad Ali Maddah-Ali,
Abstract summary: Motivated by emerging decentralized applications, the emphgame of coding framework has been introduced to address scenarios where the adversary's control over coded symbols surpasses the fundamental limits of traditional coding theory.<n>While the decoder, as the data collector (DC), has an acceptance and rejection mechanism, followed by an estimation module, the adversary aims to maximize its utility.<n>We develop an algorithm enabling the DC to commit to a strategy that achieves within the vicinity of the equilibrium, without knowledge of the adversary's utility function.
Score: 15.839621757142597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by emerging decentralized applications, the \emph{game of coding} framework has been recently introduced to address scenarios where the adversary's control over coded symbols surpasses the fundamental limits of traditional coding theory. Still, the reward mechanism available in decentralized systems, motivates the adversary to act rationally. While the decoder, as the data collector (DC), has an acceptance and rejection mechanism, followed by an estimation module, the adversary aims to maximize its utility, as an increasing function of (1) the chance of acceptance (to increase the reward), and (2) estimation error. On the other hand, the decoder also adjusts its acceptance rule to maximize its own utility, as (1) an increasing function of the chance of acceptance (to keep the system functional), (2) decreasing function of the estimation error. Prior works within this framework rely on the assumption that the game is complete, that is, both the DC and the adversary are fully aware of each other's utility functions. However, in practice, the decoder is often unaware of the utility of the adversary. To address this limitation, we develop an algorithm enabling the DC to commit to a strategy that achieves within the vicinity of the equilibrium, without knowledge of the adversary's utility function. Our approach builds on an observation that at the equilibrium, the relationship between the probability of acceptance and the mean squared error (MSE) follows a predetermined curve independent of the specific utility functions of the players. By exploiting this invariant relationship, the DC can iteratively refine its strategy based on observable parameters, converging to a near-optimal solution. We provide theoretical guarantees on sample complexity and accuracy of the proposed scheme.

Related papers

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning [58.331709210563616]
Thinking by Subtraction is a confidence-driven contrastive decoding approach.<n>A small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion.<n>Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes at these positions.
arXiv Detail & Related papers (2026-02-20T14:13:22Z)
Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z)
Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization [52.74762030521324]
We propose a novel algorithm to learn reward functions from observed actions.<n>We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm.
arXiv Detail & Related papers (2026-01-19T04:12:51Z)
Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning [60.00161035836637]
Group Relative Policy Optimization has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks.<n>We introduce Outcome-grounded Advantage Reshaping (OAR), a fine-grained credit assignment mechanism that redistributes advantages based on how much each token influences the model's final answer.<n>OAR-G achieves comparable gains with negligible computational overhead, both significantly outperforming a strong GRPO baseline.
arXiv Detail & Related papers (2026-01-12T10:48:02Z)
Game of Coding: Coding Theory in the Presence of Rational Adversaries, Motivated by Decentralized Machine Learning [16.147310961390534]
Coding theory plays a crucial role in enabling reliable communication, storage, and computation.<n>In some emerging decentralized applications, particularly in decentralized machine learning (DeML), participating nodes are rewarded for accepted contributions.<n>We introduce the game of coding, a novel game-theoretic framework that extends coding theory to trust-minimized settings.
arXiv Detail & Related papers (2026-01-05T18:04:32Z)
The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents [0.6117371161379209]
We propose a formal probabilistic framework that provides agents with a non-altruistic motive for bidirectional knowledge exchange.<n>We show how these accumulated belief states serve as verifiable reward signals for Reinforcement Learning from Human Feedback (RLHF) and high-quality data filters for Supervised Fine-Tuning (SFT)<n> Simulation results validate that this uncertainty-driven strategy significantly outperforms random baselines in heterogeneous environments.
arXiv Detail & Related papers (2025-12-24T02:02:25Z)
Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z)
Sculpting Latent Spaces With MMD: Disentanglement With Programmable Priors [30.182736043604304]
We introduce the Programmable Prior Framework, a method built on the Maximum Mean Discrepancy (MMD)<n>Our work provides a foundational tool for representation engineering, opening new avenues for model identifiability and causal reasoning.
arXiv Detail & Related papers (2025-10-13T21:26:01Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
Confidence Optimization for Probabilistic Encoding [0.9999629695552196]
We introduce a confidence-aware mechanism to adjust distance calculations.<n>We replace the conventional KL divergence-based variance regularization with a simpler L2 regularization term to directly constrain variance.<n>Our method significantly improves performance and generalization on both the BERT and the RoBERTa model.
arXiv Detail & Related papers (2025-07-22T15:32:27Z)
TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts. Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z)
Likelihood Reward Redistribution [0.0]
We propose a emphLikelihood Reward Redistribution (LRR) framework for reward redistribution. When integrated with an off-policy algorithm such as Soft Actor-Critic, LRR yields dense and informative reward signals.
arXiv Detail & Related papers (2025-03-20T20:50:49Z)
Deceptive Sequential Decision-Making via Regularized Policy Optimization [54.38738815697299]
Two regularization strategies for policy synthesis problems that actively deceive an adversary about a system's underlying rewards are presented.<n>We show how each form of deception can be implemented in policy optimization problems.<n>We show that diversionary deception can cause the adversary to believe that the most important agent is the least important, while attaining a total accumulated reward that is $98.83%$ of its optimal, non-deceptive value.
arXiv Detail & Related papers (2025-01-30T23:41:40Z)
Exploiting hidden structures in non-convex games for convergence to Nash equilibrium [62.88214569402201]
A wide array of modern machine learning applications can be formulated as non-cooperative Nashlibria. We provide explicit convergence guarantees for both deterministic and deterministic environments.
arXiv Detail & Related papers (2023-12-27T15:21:25Z)
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes. The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z)
When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z)
Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning [4.67306371596399]
Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. We show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. We propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm.
arXiv Detail & Related papers (2022-11-29T10:23:26Z)
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions [8.881195152638986]
Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function. We provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the marginalized importance sampling objectives.
arXiv Detail & Related papers (2022-10-27T15:34:17Z)
Probabilistic Control and Majorization of Optimal Control [3.2634122554914002]
Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density. In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities.
arXiv Detail & Related papers (2022-05-06T15:04:12Z)
Linear Stochastic Bandits over a Bit-Constrained Channel [37.01818450308119]
We introduce a new linear bandit formulation over a bit-constrained channel. The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret. We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
arXiv Detail & Related papers (2022-03-02T15:54:03Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.