Related papers: Protocol Agent: What If Agents Could Use Cryptography In Everyday Life?

Protocol Agent: What If Agents Could Use Cryptography In Everyday Life?

URL: http://arxiv.org/abs/2602.01304v1
Date: Sun, 01 Feb 2026 16:05:35 GMT
Title: Protocol Agent: What If Agents Could Use Cryptography In Everyday Life?
Authors: Marco De Rossi,
Abstract summary: We show how agents could develop communication patterns that are more efficient and better aligned with their capabilities.<n> cryptographic primitives that could profoundly improve everyday interactions already exist, but humans can't use them because they are too complex and the math can't be done in one's head.<n>What if agents could create protocols "on the fly" by recognizing which primitive fits an everyday situation, proposing it to an agentic counterpart, persuading them to participate, and then executing the protocol correctly using appropriate computation tools?<n>We evaluate current open-weight and state-of-the-art models on this benchmark, propose
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We often assume that agent-to-agent interaction will mirror human conversation. However, agents operate fundamentally differently. What if they could develop communication patterns that are more efficient and better aligned with their capabilities? While cryptographic primitives that could profoundly improve everyday interactions already exist, humans can't use them because they are too complex and the math can't be done in one's head. Examples range from proving your age (or other attributes) without showing your ID, to filing an anonymous report within a group while proving you are a legitimate member, to splitting a dinner bill fairly without revealing salaries. What if agents could create protocols "on the fly" by recognizing which primitive fits an everyday situation, proposing it to an agentic counterpart, persuading them to participate, and then executing the protocol correctly using appropriate computation tools? Protocol Agent frames this problem by introducing a benchmark that spans: (1) cryptographic primitive recognition, (2) negotiation skills, (3) implementation correctness, (4) correct computation and (5) security strength. We evaluate current open-weight and state-of-the-art models on this benchmark, propose a dataset-generation approach to improve these capabilities, and measure the impact of supervised fine-tuning (SFT) on benchmark performance, with tuned models outperforming base models by a wide margin.

Related papers

ClarEval: A Benchmark for Evaluating Clarification Skills of Code Agents under Ambiguous Instructions [19.875754116636436]
We introduce ClarEval, a framework designed to assess an agent's "Collaborative Quotient" by simulating the inherent ambiguity of human communication.<n>To quantify this capability, we propose a metric suite led by Average Turns to Clarify coders (ATC) and Key Question Coverage (KQC)<n>Our experiments on eleven state-of-the-art agents reveal a stark reality: while models like GPT-5-Coder excel at coding, they often lack the strategic communication skills required for efficient partnership.
arXiv Detail & Related papers (2026-02-27T01:10:27Z)
Verifiable Semantics for Agent-to-Agent Communication [0.2866560512724962]
Multiagent AI systems require consistent communication.<n>Natural language is interpretable but vulnerable to semantic drift.<n>We propose a certification protocol based on the stimulus-meaning model.
arXiv Detail & Related papers (2026-02-18T12:55:58Z)
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration [68.89572566071575]
ETAgent is a training framework for calibrating agent's tool-use behavior.<n>It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors.
arXiv Detail & Related papers (2026-01-11T11:05:26Z)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments [54.67512489842682]
Large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments.<n>We take a first step toward exploring the early-exit behavior for LLM-based agents.
arXiv Detail & Related papers (2025-05-23T08:23:36Z)
Infrastructure for AI Agents [3.373674048991415]
We propose the concept of textbfagent infrastructure: technical systems and shared protocols external to AI agents.<n>We identify three functions for agent infrastructure: 1) attributing actions to specific agents, their users, or other actors; 2) shaping agents' interactions; and 3) detecting and remedying harmful actions from agents.
arXiv Detail & Related papers (2025-01-17T10:58:12Z)
UPC Sentinel: An Accurate Approach for Detecting Upgradeability Proxy Contracts in Ethereum [8.328441582683034]
Software applications that run on a blockchain platform are known as DApps. DApps are built using smart contracts, which are immutable after deployment.<n>We introduce UPC Sentinel, a novel three-layer algorithm that utilizes both static and dynamic analysis of smart contract bytecode to accurately detect active UPCs.
arXiv Detail & Related papers (2024-12-31T23:09:06Z)
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains [43.43344028212623]
$tau$-bench is a benchmark emulating dynamic conversations between a user and a language agent. We employ an efficient and faithful evaluation process that compares the database state at the end of a conversation with the annotated goal state.
arXiv Detail & Related papers (2024-06-17T19:33:08Z)
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork [4.454557728745761]
This paper introduces an architecture that determines an ad hoc agent's behavior based on non-monotonic logical reasoning. It supports online selection, adaptation, and learning of the models that predict the other agents' behavior. We show that the performance of our architecture is comparable or better than state of the art data-driven baselines in both simple and complex scenarios.
arXiv Detail & Related papers (2023-06-01T15:21:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.