Related papers: Trojans in Artificial Intelligence (TrojAI) Final Report

Trojans in Artificial Intelligence (TrojAI) Final Report

URL: http://arxiv.org/abs/2602.07152v1
Date: Fri, 06 Feb 2026 19:52:14 GMT
Title: Trojans in Artificial Intelligence (TrojAI) Final Report
Authors: Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski, Tim Blattner, Derek Juba, Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Alden Dima, Anthony J. Kearsley, Melinda Kleczynski, Joel Vasanth, Walid Keyrouz, Chace Ashcraft, Neil Fendley, Ted Staley, Trevor Stout, Josh Carney, Greg Canal, Will Redman, Aurora Schmidt, Cameron Hickert, William Paul, Jared Markowitz, Nathan Drenkow, David Shriver, Marissa Connor, Keltin Grimes, Marco Christiani, Hayden Moore, Jordan Widjaja, Kasimir Gabert, Uma Balakrishnan, Satyanadh Gundimada, John Jacobellis, Sandya Lakkur, Vitus Leung, Jon Roose, Casey Battaglino, Farinaz Koushanfar, Greg Fields, Xihe Gu, Yaman Jandali, Xinqiao Zhang, Akash Vartak, Tim Oates, Ben Erichson, Michael Mahoney, Rauf Izmailov, Xiangyu Zhang, Guangyu Shen, Siyuan Cheng, Shiqing Ma, XiaoFeng Wang, Haixu Tang, Di Tang, Xiaoyi Chen, Zihao Wang, Rui Zhu, Susmit Jha, Xiao Lin, Manoj Acharya, Wenchao Li, Chao Chen,
Abstract summary: TrojAI was launched to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans.<n>TrojAI helped to map out the complex nature of the threat and pioneered foundational detection methods.<n>Report concludes with lessons learned and recommendations for advancing AI security research.
Score: 52.6138928911574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models [67.06525001375722]
TrojanTO is the first action-level backdoor attack against TO models.<n>It implants backdoor attacks across diverse tasks and attack objectives with a low attack budget.<n>TrojanTO exhibits broad applicability to DT, GDT, and DC.
arXiv Detail & Related papers (2025-06-15T11:27:49Z)
Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters [5.097354139604596]
We propose SAMURAI, a novel framework for safeguarding against malicious usage of AI hardware.<n> SAMURAI introduces an AI Performance Counter ( APC) for tracking dynamic behavior of an AI model.<n> APC records the runtime profile of the low-level hardware events of different AI operations.<n>The summary information recorded by the APC is processed by TANTO to efficiently identify potential security breaches.
arXiv Detail & Related papers (2025-03-10T17:38:42Z)
Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z)
Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security [0.0]
This paper explores the integration of Artificial Intelligence (AI) into offensive cybersecurity. It develops an autonomous AI agent, ReaperAI, designed to simulate and execute cyberattacks. ReaperAI demonstrates the potential to identify, exploit, and analyze security vulnerabilities autonomously.
arXiv Detail & Related papers (2024-05-09T18:15:12Z)
Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge [0.056247917037481096]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, but their vulnerability to trojan or backdoor attacks poses significant security risks. This paper explores the challenges and insights gained from the Trojan Detection Competition 2023 (TDC2023) We investigate the difficulty of distinguishing between intended and unintended triggers, as well as the feasibility of reverse engineering trojans in real-world scenarios.
arXiv Detail & Related papers (2024-04-21T13:31:16Z)
Review of Generative AI Methods in Cybersecurity [0.6990493129893112]
This paper provides a comprehensive overview of the current state-of-the-art deployments of Generative AI (GenAI) It covers assaults, jailbreaking, and applications of prompt injection and reverse psychology. It also provides the various applications of GenAI in cybercrimes, such as automated hacking, phishing emails, social engineering, reverse cryptography, creating attack payloads, and creating malware.
arXiv Detail & Related papers (2024-03-13T17:05:05Z)
Asset-centric Threat Modeling for AI-based Systems [7.696807063718328]
This paper presents ThreatFinderAI, an approach and tool to model AI-related assets, threats, countermeasures, and quantify residual risks. To evaluate the practicality of the approach, participants were tasked to recreate a threat model developed by cybersecurity experts of an AI-based healthcare platform. Overall, the solution's usability was well-perceived and effectively supports threat identification and risk discussion.
arXiv Detail & Related papers (2024-03-11T08:40:01Z)
Towards more Practical Threat Models in Artificial Intelligence Security [66.67624011455423]
Recent works have identified a gap between research and practice in artificial intelligence security. We revisit the threat models of the six most studied attacks in AI security research and match them to AI usage in practice.
arXiv Detail & Related papers (2023-11-16T16:09:44Z)
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review [0.0]
This review aims to outline the state-of-the-art AI techniques used in malware detection and prevention. The algorithms investigated consist of Shallow Learning, Deep Learning and Bio-Inspired Computing. The survey also touches on the rapid adoption of AI by cybercriminals as a means to create ever more advanced malware.
arXiv Detail & Related papers (2022-10-12T16:44:52Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.