Related papers: Quantifying Distributional Robustness of Agentic Tool-Selection

Quantifying Distributional Robustness of Agentic Tool-Selection

URL: http://arxiv.org/abs/2510.03992v1
Date: Sun, 05 Oct 2025 01:50:34 GMT
Title: Quantifying Distributional Robustness of Agentic Tool-Selection
Authors: Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh,
Abstract summary: We introduce ToolCert, the first statistical framework that formally certifies tool selection robustness.<n>We show that ToolCert produces a high-confidence lower bound on accuracy, formally quantifying the agent's worst-case performance.<n>Our evaluation with ToolCert uncovers the severe fragility: under attacks injecting deceptive tools or saturating retrieval, the certified accuracy bound drops near zero.
Score: 8.457056023589951
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A critical step in this process is tool selection, where a retriever first surfaces candidate tools from a larger pool, after which the LLM selects the most appropriate one. This pipeline presents an underexplored attack surface where errors in selection can lead to severe outcomes like unauthorized data access or denial of service, all without modifying the agent's model or code. While existing evaluations measure task performance in benign settings, they overlook the specific vulnerabilities of the tool selection mechanism under adversarial conditions. To address this gap, we introduce ToolCert, the first statistical framework that formally certifies tool selection robustness. ToolCert models tool selection as a Bernoulli success process and evaluates it against a strong, adaptive attacker who introduces adversarial tools with misleading metadata, and are iteratively refined based on the agent's previous choices. By sampling these adversarial interactions, ToolCert produces a high-confidence lower bound on accuracy, formally quantifying the agent's worst-case performance. Our evaluation with ToolCert uncovers the severe fragility: under attacks injecting deceptive tools or saturating retrieval, the certified accuracy bound drops near zero, an average performance drop of over 60% compared to non-adversarial settings. For attacks targeting the retrieval and selection stages, the certified accuracy bound plummets to less than 20% after just a single round of adversarial adaptation. ToolCert thus reveals previously unexamined security threats inherent to tool selection and provides a principled method to quantify an agent's robustness to such threats, a necessary step for the safe deployment of agentic systems.

Related papers

Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning [58.432996881401415]
Recent work augments large language models (LLMs) with external tools to enable agentic reasoning.<n>We propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt.<n>STA generates benign-looking prompt rewrites from the original one with high semantic fidelity.
arXiv Detail & Related papers (2026-01-24T19:36:51Z)
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z)
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents [24.482362292984817]
Large language models (LLMs) are rapidly evolving to handle multi-turn tasks.<n> Ensuring their trustworthiness remains a critical challenge.<n> calibration refers to an agent's ability to express confidence that reliably reflects its actual performance.
arXiv Detail & Related papers (2026-01-12T07:10:35Z)
ToolTweak: An Attack on Tool Selection in LLM-based Agents [52.17181489286236]
We show that adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives.<n>We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%.<n>To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally.
arXiv Detail & Related papers (2025-10-02T20:44:44Z)
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models [55.119657444627855]
Large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options.<n>This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition.<n>We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias.
arXiv Detail & Related papers (2025-09-30T22:02:13Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools [10.086284534400658]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools.<n>We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents.<n>We propose a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata.
arXiv Detail & Related papers (2025-08-04T06:38:59Z)
Prompt Injection Attack to Tool Selection in LLM Agents [60.95349602772112]
A popular approach follows a two-step process - emphretrieval and emphselection - to pick the most appropriate tool from a tool library for a given task.<n>In this work, we introduce textitToolHijacker, a novel prompt injection attack targeting tool selection in no-box scenarios.
arXiv Detail & Related papers (2025-04-28T13:36:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.