Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats
- URL: http://arxiv.org/abs/2601.15679v1
- Date: Thu, 22 Jan 2026 06:00:00 GMT
- Title: Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats
- Authors: Ee Wei Seah, Yongsen Zheng, Naga Nikshith, Mahran Morsidi, Gabriel Waikin Loh Matienzo, Nigel Gay, Akriti Vij, Benjamin Chua, En Qi Ng, Sharmini Johnson, Vanessa Wilfred, Wan Sie Lee, Anna Davidson, Catherine Devine, Erin Zorer, Gareth Holvey, Harry Coppock, James Walpole, Jerome Wynee, Magda Dubois, Michael Schmatz, Patrick Keane, Sam Deverett, Bill Black, Bo Yan, Bushra Sabir, Frank Sun, Hao Zhang, Harriet Farlow, Helen Zhou, Lingming Dong, Qinghua Lu, Seung Jang, Sharif Abuadbba, Simon O'Callaghan, Suyu Ma, Tom Howroyd, Cyrus Fung, Fatemeh Azadi, Isar Nejadgholi, Krishnapriya Vishnubhotla, Pulei Xiong, Saeedeh Lohrasbi, Scott Buffett, Shahrear Iqbal, Sowmya Vajjala, Anna Safont-Andreu, Luca Massarelli, Oskar van der Wal, Simon Möller, Agnes Delaborde, Joris Duguépéroux, Nicolas Rolin, Romane Gallienne, Sarah Behanzin, Tom Seimandi, Akiko Murakami, Takayuki Semitsu, Teresa Tsukiji, Angela Kinuthia, Michael Michie, Stephanie Kasaon, Jean Wangari, Hankyul Baek, Jaewon Noh, Kihyuk Nam, Sang Seo, Sungpil Shin, Taewhi Lee, Yongsu Kim,
- Abstract summary: Agent testing remains nascent and is still a developing science.<n>As AI agents begin to be deployed globally, it is important that they handle different languages and cultures accurately and securely.<n>This is the third exercise, building on insights from two earlier joint testing exercises conducted by the Network.
- Score: 17.766681829762256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid rise of autonomous AI systems and advancements in agent capabilities are introducing new risks due to reduced oversight of real-world interactions. Yet agent testing remains nascent and is still a developing science. As AI agents begin to be deployed globally, it is important that they handle different languages and cultures accurately and securely. To address this, participants from The International Network for Advanced AI Measurement, Evaluation and Science, including representatives from Singapore, Japan, Australia, Canada, the European Commission, France, Kenya, South Korea, and the United Kingdom have come together to align approaches to agentic evaluations. This is the third exercise, building on insights from two earlier joint testing exercises conducted by the Network in November 2024 and February 2025. The objective is to further refine best practices for testing advanced AI systems. The exercise was split into two strands: (1) common risks, including leakage of sensitive information and fraud, led by Singapore AISI; and (2) cybersecurity, led by UK AISI. A mix of open and closed-weight models were evaluated against tasks from various public agentic benchmarks. Given the nascency of agentic testing, our primary focus was on understanding methodological issues in conducting such tests, rather than examining test results or model capabilities. This collaboration marks an important step forward as participants work together to advance the science of agentic evaluations.
Related papers
- Can AI Lower the Barrier to Cybersecurity? A Human-Centered Mixed-Methods Study of Novice CTF Learning [0.0]
Agentic AI frameworks for cybersecurity promise to lower barriers by automating and coordinating penetration testing tasks.<n>We present a human-centered, mixed-methods case study examining how agentic AI frameworks mediates novice entry into CTF-based penetration testing.
arXiv Detail & Related papers (2026-02-20T12:20:36Z) - The Role of AI in Modern Penetration Testing [0.0]
Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes.<n>This systematic literature review examines how Artificial Intelligence (AI) is reshaping penetration testing.
arXiv Detail & Related papers (2025-12-13T13:34:31Z) - AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI [50.802995291689086]
We introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI.<n>We define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks to cover both universal harms and relevance to the Korean socio-cultural context.<n>AssurAI is a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio.
arXiv Detail & Related papers (2025-11-20T13:59:42Z) - AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite [75.58737079136942]
We present AstaBench, a suite that provides the first holistic measure of agentic ability to perform scientific research.<n>Our suite comes with the first scientific research environment with production-grade search tools.<n>Our evaluation of 57 agents across 22 agent classes reveals several interesting findings.
arXiv Detail & Related papers (2025-10-24T17:10:26Z) - Ask What Your Country Can Do For You: Towards a Public Red Teaming Model [1.4138385478350077]
We propose a cooperative public AI red-teaming exercise.<n>First in-person public demonstrator exercise was held in conjunction with CAMLIS 2024.<n>We argue that this approach is both capable of delivering meaningful results and is also scalable to many AI developing jurisdictions.
arXiv Detail & Related papers (2025-10-22T22:24:21Z) - International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications [118.49965571969089]
This update examines how AI capabilities have improved since the first AI Safety Report.<n>It focuses on key risk areas where substantial new evidence warrants updated assessments.
arXiv Detail & Related papers (2025-10-15T15:13:49Z) - The Singapore Consensus on Global AI Safety Research Priorities [128.58674892183657]
"2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space.<n>Report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments.<n>Report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment) and challenges with monitoring and intervening after deployment (Control)
arXiv Detail & Related papers (2025-06-25T17:59:50Z) - Report on NSF Workshop on Science of Safe AI [75.96202715567088]
New advances in machine learning are leading to new opportunities to develop technology-based solutions to societal problems.<n>To fulfill the promise of AI, we must address how to develop AI-based systems that are accurate and performant but also safe and trustworthy.<n>This report is the result of the discussions in the working groups that addressed different aspects of safety at the workshop.
arXiv Detail & Related papers (2025-06-24T18:55:29Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Autonomation, Not Automation: Activities and Needs of European Fact-checkers as a Basis for Designing Human-Centered AI Systems [7.654738260420559]
We conducted in-depth interviews with Central European fact-checkers.<n>Our contributions include an in-depth examination of the variability of fact-checking work in non-English-speaking regions.<n>We mapped our findings on the fact-checkers' activities and needs to the relevant tasks for AI research.
arXiv Detail & Related papers (2022-11-22T10:18:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.