Robust AI Security and Alignment: A Sisyphean Endeavor?
- URL: http://arxiv.org/abs/2512.10100v1
- Date: Wed, 10 Dec 2025 21:44:10 GMT
- Title: Robust AI Security and Alignment: A Sisyphean Endeavor?
- Authors: Apostol Vassilev,
- Abstract summary: This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by extending Gdel's incompleteness theorem to AI.<n>Knowing these limitations and preparing for the challenges they bring is critically important for the responsible adoption of the AI technology.
- Score: 0.03691941137525625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by extending Gödel's incompleteness theorem to AI. Knowing these limitations and preparing for the challenges they bring is critically important for the responsible adoption of the AI technology. Practical approaches to dealing with these challenges are provided as well. Broader implications for cognitive reasoning limitations of AI systems are also proven.
Related papers
- Beware of "Explanations" of AI [16.314859121110945]
Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge.<n>This has led to an expanding field of research in explainable artificial intelligence (XAI)<n>The question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context.
arXiv Detail & Related papers (2025-04-09T11:31:08Z) - Responsible Development of Offensive AI [0.0]
This study aims to establish priorities that balance societal benefits against risks.<n>The two forms of offensive AI evaluated in this study are vulnerability detection agents, which solve Capture- The-Flag challenges, and AI-powered malware.
arXiv Detail & Related papers (2025-04-03T15:37:38Z) - Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review [12.38351931894004]
We present the first systematic literature review of explainable methods for safe and trustworthy autonomous driving.
We identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation.
We propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
arXiv Detail & Related papers (2024-02-08T09:08:44Z) - Predictable Artificial Intelligence [77.1127726638209]
This paper introduces the ideas and challenges of Predictable AI.<n>It explores the ways in which we can anticipate key validity indicators of present and future AI ecosystems.<n>We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems.
arXiv Detail & Related papers (2023-10-09T21:36:21Z) - Beyond XAI:Obstacles Towards Responsible AI [0.0]
Methods of explainability and their evaluation strategies present numerous limitations in real-world contexts.
In this paper, we explore these limitations and discuss their implications in a boarder context of responsible AI.
arXiv Detail & Related papers (2023-09-07T11:08:14Z) - AI Maintenance: A Robustness Perspective [91.28724422822003]
We introduce highlighted robustness challenges in the AI lifecycle and motivate AI maintenance by making analogies to car maintenance.
We propose an AI model inspection framework to detect and mitigate robustness risks.
Our proposal for AI maintenance facilitates robustness assessment, status tracking, risk scanning, model hardening, and regulation throughout the AI lifecycle.
arXiv Detail & Related papers (2023-01-08T15:02:38Z) - Inherent Limitations of AI Fairness [16.588468396705366]
The study of AI fairness has rapidly developed into a rich field of research with links to computer science, social science, law, and philosophy.
Many technical solutions for measuring and achieving AI fairness have been proposed, yet their approach has been criticized in recent years for being misleading, unrealistic and harmful.
arXiv Detail & Related papers (2022-12-13T11:23:24Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Proceedings of the Artificial Intelligence for Cyber Security (AICS)
Workshop at AAAI 2022 [55.573187938617636]
The workshop will focus on the application of AI to problems in cyber security.
Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities.
arXiv Detail & Related papers (2022-02-28T18:27:41Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.