Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
- URL: http://arxiv.org/abs/2508.05469v1
- Date: Thu, 07 Aug 2025 15:11:43 GMT
- Title: Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
- Authors: Zachary Robertson, Sanmi Koyejo,
- Abstract summary: We develop mechanisms for evaluating AI systems without ground truth by exploiting a connection between gaming resistance and output quality.<n>We prove that f-mutual information measures are the unique gaming resistant mechanisms under natural conditions.
- Score: 14.371259136517802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop mechanisms for evaluating AI systems without ground truth by exploiting a connection between gaming resistance and output quality. The data processing inequality ensures post-hoc attempts to game a metric degrades both information content and task performance. We prove that f-mutual information measures are the unique gaming resistant mechanisms under natural conditions, with the overseer acting as an agent. While Shannon mutual information faces exponential sample complexity, bounded measures like total variation distance remain tractable. Empirically, across ten domains from translation to peer review, all information-theoretic mechanisms achieve perfect discrimination (d > 0.5) between faithful and strategic agents. In contrast, LLM judges exhibit systematic evaluation inversion, preferring fabricated content over accurate summaries. Our mechanisms show 10-100x better robustness to adversarial manipulation than current practices. We also find performance follows an inverted-U curve with compression ratio, peaking at 10:1 where agent responses exhibit optimal information diversity (3 effective dimensions), giving a bias-variance perspective on when our approach is expected to be most effective.
Related papers
- Incentivizing Truthful Language Models via Peer Elicitation Games [10.530016288072506]
Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations.<n>We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models.
arXiv Detail & Related papers (2025-05-19T18:16:58Z) - Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.<n>In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches.<n>We show that even moderate levels of information sharing significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z) - Self-Supervised Inference of Agents in Trustless Environments [44.99833362998488]
We propose a novel approach where agents can form swarms to produce high-quality responses effectively.
This is accomplished by utilizing agents capable of data inference and ranking.
We show that our approach is an order of magnitude faster than other trustless inference strategies reaching less than 125 ms validation latency.
arXiv Detail & Related papers (2024-09-12T20:32:07Z) - In Search of Lost Online Test-time Adaptation: A Survey [40.68806005826287]
This article presents a comprehensive survey of online test-time adaptation (OTTA)
We classify OTTA techniques into three primary categories and benchmark them using a modern backbone, the Vision Transformer (ViT)
Our findings diverge from existing literature, revealing that transformers demonstrate heightened resilience to diverse domain shifts.
arXiv Detail & Related papers (2023-10-31T05:47:33Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts
for Zero-Shot Dialogue State Tracking [83.40120598637665]
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.
Existing works mainly study common data- or model-level augmentation methods to enhance the generalization.
We present a simple and effective "divide, conquer and combine" solution, which explicitly disentangles the semantics of seen data.
arXiv Detail & Related papers (2023-06-01T08:21:20Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Controllable Guarantees for Fair Outcomes via Contrastive Information
Estimation [32.37031528767224]
Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications.
We demonstrate an effective method for controlling parity through mutual information based on contrastive information estimators.
We test our approach on UCI Adult and Heritage Health datasets and demonstrate that our approach provides more informative representations across a range of desired parity thresholds.
arXiv Detail & Related papers (2021-01-11T18:57:33Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Improving LIME Robustness with Smarter Locality Sampling [0.0]
We propose to make LIME more robust by training a generative adversarial network to sample more realistic synthetic data.
Our experiments demonstrate an increase in accuracy across three real-world datasets in detecting biased, adversarial behavior.
This is achieved while maintaining comparable explanation quality, with up to 99.94% in top-1 accuracy in some cases.
arXiv Detail & Related papers (2020-06-22T14:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.