Related papers: PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight

PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight

URL: http://arxiv.org/abs/2504.21029v1
Date: Sat, 26 Apr 2025 00:46:13 GMT
Title: PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight
Authors: Ben Goertzel, Paulos Yibelo,
Abstract summary: We propose a robust transformer architecture designed to prevent prompt injection attacks.<n>Our PICO framework structurally separates trusted system instructions from untrusted user inputs.<n>We incorporate a specialized Security Expert Agent within a Mixture-of-Experts framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a robust transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation. Our PICO (Prompt Isolation and Cybersecurity Oversight) framework structurally separates trusted system instructions from untrusted user inputs through dual channels that are processed independently and merged only by a controlled, gated fusion mechanism. In addition, we integrate a specialized Security Expert Agent within a Mixture-of-Experts (MoE) framework and incorporate a Cybersecurity Knowledge Graph (CKG) to supply domain-specific reasoning. Our training design further ensures that the system prompt branch remains immutable while the rest of the network learns to handle adversarial inputs safely. This PICO framework is presented via a general mathematical formulation, then elaborated in terms of the specifics of transformer architecture, and fleshed out via hypothetical case studies including Policy Puppetry attacks. While the most effective implementation may involve training transformers in a PICO-based way from scratch, we also present a cost-effective fine-tuning approach.

Related papers

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z)
Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing [0.0]
We propose Guardian-FC, a novel framework for privacy preserving federated computing.<n>It unifies safety enforcement across diverse privacy preserving mechanisms.<n>We present qualitative scenarios illustrating backend-agnostic safety and a formal model foundation for verification.
arXiv Detail & Related papers (2025-06-24T20:39:49Z)
Towards Safety and Security Testing of Cyberphysical Power Systems by Shape Validation [42.350737545269105]
complexity of cyberphysical power systems leads to larger attack surfaces to be exploited by malicious actors.<n>We propose to meet those risks with a declarative approach to describe cyber power systems and automatically evaluate security and safety controls.
arXiv Detail & Related papers (2025-06-14T12:07:44Z)
LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z)
Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things [61.43014629640404]
Zero-Trust Foundation Models (ZTFMs) embed zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems.<n>ZTFMs can enable secure, privacy-preserving AI across distributed, heterogeneous, and potentially adversarial IoT environments.
arXiv Detail & Related papers (2025-05-26T06:44:31Z)
Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation [55.02966123945644]
We propose a hierarchical control framework leveraging neural network verification techniques to design control barrier functions (CBFs) and policy correction mechanisms. Our approach relies on probabilistic enumeration to identify unsafe regions of operation, which are then used to construct a safe CBF-based control layer. These experiments demonstrate the ability of the proposed solution to correct unsafe actions while preserving efficient navigation behavior.
arXiv Detail & Related papers (2025-04-30T13:47:25Z)
Combined Hyper-Extensible Extremely-Secured Zero-Trust CIAM-PAM architecture [0.0]
This paper introduces the Combined Hyper-Extensible Extremely-Secured Zero-Trust (CHEZ) CIAM-PAM architecture.<n>The framework addresses critical security gaps by integrating password-less authentication, adaptive multi-factor authentication, microservice-based PEP, multi-layer RBAC and multi-level trust systems.<n>It also includes end-to-end data encryption, and seamless integration with state-of-the-art AI-based threat detection systems.
arXiv Detail & Related papers (2025-01-03T09:49:25Z)
Securing Legacy Communication Networks via Authenticated Cyclic Redundancy Integrity Check [98.34702864029796]
We propose Authenticated Cyclic Redundancy Integrity Check (ACRIC) ACRIC preserves backward compatibility without requiring additional hardware and is protocol agnostic. We show that ACRIC offers robust security with minimal transmission overhead ( 1 ms)
arXiv Detail & Related papers (2024-11-21T18:26:05Z)
Automated Hardware Logic Obfuscation Framework Using GPT [3.1789948141373077]
We introduce Obfus-chat, a novel framework leveraging Generative Pre-trained Transformer (GPT) models to automate the obfuscation process. The proposed framework accepts hardware design netlists and key sizes as inputs, and autonomously generates obfuscated code tailored to enhance security.
arXiv Detail & Related papers (2024-05-20T17:33:00Z)
Securing Federated Learning with Control-Flow Attestation: A Novel Framework for Enhanced Integrity and Resilience against Adversarial Attacks [2.28438857884398]
Federated Learning (FL) as a distributed machine learning paradigm has introduced new cybersecurity challenges. This study proposes an innovative security framework inspired by Control-Flow (CFA) mechanisms, traditionally used in cybersecurity. We authenticate and verify the integrity of model updates across the network, effectively mitigating risks associated with model poisoning and adversarial interference.
arXiv Detail & Related papers (2024-03-15T04:03:34Z)
HOACS: Homomorphic Obfuscation Assisted Concealing of Secrets to Thwart Trojan Attacks in COTS Processor [0.6874745415692134]
We propose a software-oriented countermeasure to ensure the confidentiality of secret assets against hardware Trojans. The proposed solution does not require any supply chain entity to be trusted and does not require analysis or modification of the IC design. We have implemented the proposed solution to protect the secret key within the Advanced Encryption Standard (AES) program and presented a detailed security analysis.
arXiv Detail & Related papers (2024-02-15T04:33:30Z)
A Survey and Comparative Analysis of Security Properties of CAN Authentication Protocols [92.81385447582882]
The Controller Area Network (CAN) bus leaves in-vehicle communications inherently non-secure. This paper reviews and compares the 15 most prominent authentication protocols for the CAN bus. We evaluate protocols based on essential operational criteria that contribute to ease of implementation.
arXiv Detail & Related papers (2024-01-19T14:52:04Z)
Secure Transformer Inference Protocol [15.610303095235372]
Security of model parameters and user data is critical for Transformer-based services, such as ChatGPT. Recent strides in secure two-party protocols have successfully addressed security concerns in serving Transformer models, but their adoption is practically infeasible due to the prohibitive cryptographic overheads involved. We present STIP, the first secure Transformer inference protocol without any inference accuracy loss.
arXiv Detail & Related papers (2023-11-14T14:37:23Z)
Trust-Aware Resilient Control and Coordination of Connected and Automated Vehicles [11.97553028903872]
Adversarial attacks can cause safety violations resulting in collisions and traffic jams. We propose a decentralized resilient control and coordination scheme that mitigates the effects of adversarial attacks and uncooperative CAVs.
arXiv Detail & Related papers (2023-05-26T10:57:51Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
We present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications. We provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology. We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions.
arXiv Detail & Related papers (2021-06-03T16:45:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.