Related papers: EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law

EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law

URL: http://arxiv.org/abs/2510.21524v1
Date: Fri, 24 Oct 2025 14:48:10 GMT
Title: EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law
Authors: Ilija Lichkovski, Alexander Müller, Mariam Ibrahim, Tiwai Mhundwa,
Abstract summary: EU-Agent-Bench is a verifiable benchmark that evaluates an agent's alignment with EU legal norms.<n>Our benchmark spans scenarios across several categories, including data protection, bias/discrimination, and scientific integrity.<n>We release a public preview set for the research community, while holding out a private test set to prevent data contamination.
Score: 39.146761527401424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly deployed as agents in various contexts by providing tools at their disposal. However, LLM agents can exhibit unpredictable behaviors, including taking undesirable and/or unsafe actions. In order to measure the latent propensity of LLM agents for taking illegal actions under an EU legislative context, we introduce EU-Agent-Bench, a verifiable human-curated benchmark that evaluates an agent's alignment with EU legal norms in situations where benign user inputs could lead to unlawful actions. Our benchmark spans scenarios across several categories, including data protection, bias/discrimination, and scientific integrity, with each user request allowing for both compliant and non-compliant execution of the requested actions. Comparing the model's function calls against a rubric exhaustively supported by citations of the relevant legislature, we evaluate the legal compliance of frontier LLMs, and furthermore investigate the compliance effect of providing the relevant legislative excerpts in the agent's system prompt along with explicit instructions to comply. We release a public preview set for the research community, while holding out a private test set to prevent data contamination in evaluating upcoming models. We encourage future work extending agentic safety benchmarks to different legal jurisdictions and to multi-turn and multilingual interactions. We release our code on \href{https://github.com/ilijalichkovski/eu-agent-bench}{this URL}.

Related papers

Are Your Agents Upward Deceivers? [73.1073084327614]
Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users.<n>This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment.<n>We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting.
arXiv Detail & Related papers (2025-12-04T14:47:05Z)
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia [100.74015791021044]
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction.<n>Existing evaluation methods fail to measure how well these capabilities generalize to novel social situations.<n>We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains.
arXiv Detail & Related papers (2025-12-03T00:11:05Z)
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts [54.15982476754607]
Large language models (LLMs) are now deployed at unprecedented scale, assisting millions of users in daily tasks.<n>This study defines complicit facilitation as the provision of guidance or support that enables illicit user instructions.<n>Using real-world legal cases and established legal frameworks, we construct an evaluation benchmark spanning 269 illicit scenarios and 50 illicit intents.
arXiv Detail & Related papers (2025-11-25T16:01:31Z)
Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM [42.11889345473452]
Legal Article Prediction (LAP) is a critical task in legal text classification.<n>We propose Uni-LAP, a universal framework for legal article prediction.
arXiv Detail & Related papers (2025-09-26T09:42:20Z)
SAND: Boosting LLM Agents with Self-Taught Action Deliberation [54.48979740613828]
Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
arXiv Detail & Related papers (2025-07-10T05:38:15Z)
LLMs for Legal Subsumption in German Employment Contracts [3.3916160303055567]
This study explores the use of Large Language Models and in-context learning to evaluate the legality of clauses in German employment contracts.<n>Our work evaluates the ability of different LLMs to classify clauses as "valid," "unfair," or "void" under three legal context variants.<n>Results show that full-text sources moderately improve performance, while examination guidelines significantly enhance recall for void clauses and weighted F1-Score, reaching 80%.
arXiv Detail & Related papers (2025-07-02T14:07:54Z)
AUTOLAW: Enhancing Legal Compliance in Large Language Models via Case Law Generation and Jury-Inspired Deliberation [5.732271982985626]
AutoLaw is a novel violation detection framework for domain-specific large language models (LLMs)<n>It combines adversarial data generation with a jury-inspired deliberation process to enhance legal compliance of LLMs.<n>Our results highlight the framework's ability to adaptively probe legal misalignments and deliver reliable, context-aware judgments.
arXiv Detail & Related papers (2025-05-20T07:09:13Z)
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction [37.856194200684364]
This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios.<n>MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants' characters and behaviors.
arXiv Detail & Related papers (2025-02-08T15:05:24Z)
LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.<n>We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.<n>We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z)
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation [48.54271457765236]
Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values. Current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. We propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments.
arXiv Detail & Related papers (2024-05-23T02:57:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.