Related papers: Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

URL: http://arxiv.org/abs/2512.24415v1
Date: Tue, 30 Dec 2025 18:57:52 GMT
Title: Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service
Authors: Jingyu Zhang,
Abstract summary: Cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions.<n>100 realistic attack scripts grouped into five technique families.<n>Attacks are highly domain-dependent (airline support is most exploitable) and technique-dependent (payload is most consistently effective)
Score: 15.896831937702174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction style can be exploited: a small fraction of users can induce unauthorized concessions, shifting costs to others and eroding trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped into five technique families. Across five widely used models under a unified rubric with uncertainty reporting, attacks are highly domain-dependent (airline support is most exploitable) and technique-dependent (payload splitting is most consistently effective). We release data and evaluation code to support reproducible auditing and to inform the design of oversight and recovery workflows for trustworthy, human centered agent interfaces.

Related papers

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks [54.04030169323115]
We introduce CREDIT, a certified ownership verification against Model Extraction Attacks (MEAs)<n>We quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold.<n>We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2026-02-23T23:36:25Z)
The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z)
Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence [1.8357468337756873]
We introduce JourneyBench, a benchmark designed to assess policy-aware agents in customer support.<n>We evaluate multiple state-of-the-art agent designs using two agent designs: a Static-Prompt Agent (SPA) and a Dynamic-Prompt Agent (DPA)<n>We show that DPA significantly boosts policy adherence, even allowing smaller models like GPT-4o-mini to outperform more capable ones like GPT-4o-mini.
arXiv Detail & Related papers (2026-01-02T07:21:23Z)
How can we assess human-agent interactions? Case studies in software agent design [52.953425368394306]
We make two major steps towards the rigorous assessment of human-agent interactions.<n>We propose PULSE, a framework for more efficient human-centric evaluation of agent designs.<n>We deploy the framework on a large-scale web platform built around the open-source software agent OpenHands.
arXiv Detail & Related papers (2025-10-10T19:04:28Z)
ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version) [58.595691399741646]
Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs)<n>This setup enables SL to leverage server capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data.<n>We present ZORRO, a private, verifiable, and robust SL defense scheme.
arXiv Detail & Related papers (2025-09-11T18:44:09Z)
Effective Red-Teaming of Policy-Adherent Agents [10.522087614181745]
Task-oriented LLM-based agents are increasingly used in domains with strict policies, such as refund eligibility or cancellation rules.<n>We propose a novel threat model that focuses on adversarial users aiming to exploit policy-adherent agents for personal benefit.<n>We present CRAFT, a multi-agent red-teaming system that leverages policy-aware persuasive strategies to undermine a policy-adherent agent in a customer-service scenario.
arXiv Detail & Related papers (2025-06-11T10:59:47Z)
The Real Barrier to LLM Agent Usability is Agentic ROI [110.31127571114635]
Large Language Model (LLM) agents represent a promising shift in human-AI interaction.<n>We highlight a critical usability gap in high-demand, mass-market applications.
arXiv Detail & Related papers (2025-05-23T11:40:58Z)
Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning [7.808916974942399]
heterogeneous edge devices produce diverse, non-independent, and identically distributed (non-IID) data.<n>We propose a novel representative-attention-based defense mechanism, named FeRA, to distinguish benign from malicious clients.<n>Our evaluation demonstrates FeRA's robustness across various FL scenarios, including challenging non-IID data distributions typical of edge devices.
arXiv Detail & Related papers (2025-05-15T13:44:32Z)
Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks [48.70867241987739]
InferGuard is a novel Byzantine-robust aggregation rule aimed at defending against client-side training data distribution inference attacks. The results of our experiments indicate that our defense mechanism is highly effective in protecting against client-side training data distribution inference attacks.
arXiv Detail & Related papers (2024-03-05T17:41:35Z)
Towards Fair, Robust and Efficient Client Contribution Evaluation in Federated Learning [16.543724155324938]
We introduce a novel method called Fair, Robust, and Efficient Client Assessment (FRECA) for quantifying client contributions in Federated Learning (FL) FRECA employs a framework called FedTruth to estimate the global model's ground truth update, balancing contributions from all clients while filtering out impacts from malicious ones. Our experimental results show that FRECA can accurately and efficiently quantify client contributions in a robust manner.
arXiv Detail & Related papers (2024-02-06T21:07:12Z)
G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing. FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity. We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.