Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems
- URL: http://arxiv.org/abs/2506.10192v1
- Date: Wed, 11 Jun 2025 21:30:02 GMT
- Title: Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems
- Authors: Filip Cano,
- Abstract summary: This thesis advances knowledge in the safety, fairness, transparency, and accountability of AI systems.<n>We extend classical deterministic shielding techniques to become resilient against delayed observations.<n>We introduce fairness shields, a novel post-processing approach to enforce group fairness in sequential decision-making settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring responsible use of artificial intelligence (AI) has become imperative as autonomous systems increasingly influence critical societal domains. However, the concept of trustworthy AI remains broad and multi-faceted. This thesis advances knowledge in the safety, fairness, transparency, and accountability of AI systems. In safety, we extend classical deterministic shielding techniques to become resilient against delayed observations, enabling practical deployment in real-world conditions. We also implement both deterministic and probabilistic safety shields into simulated autonomous vehicles to prevent collisions with road users, validating the use of these techniques in realistic driving simulators. We introduce fairness shields, a novel post-processing approach to enforce group fairness in sequential decision-making settings over finite and periodic time horizons. By optimizing intervention costs while strictly ensuring fairness constraints, this method efficiently balances fairness with minimal interference. For transparency and accountability, we propose a formal framework for assessing intentional behaviour in probabilistic decision-making agents, introducing quantitative metrics of agency and intention quotient. We use these metrics to propose a retrospective analysis of intention, useful for determining responsibility when autonomous systems cause unintended harm. Finally, we unify these contributions through the ``reactive decision-making'' framework, providing a general formalization that consolidates previous approaches. Collectively, the advancements presented contribute practically to the realization of safer, fairer, and more accountable AI systems, laying the foundations for future research in trustworthy AI.
Related papers
- Rethinking Data Protection in the (Generative) Artificial Intelligence Era [115.71019708491386]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z) - Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z) - Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods [0.0]
This literature review consolidates the rapidly evolving field of AI safety evaluations.<n>It proposes a systematic taxonomy around three dimensions: what properties we measure, how we measure them, and how these measurements integrate into frameworks.
arXiv Detail & Related papers (2025-05-08T16:55:07Z) - A Domain-Agnostic Scalable AI Safety Ensuring Framework [8.086635708001166]
We propose a novel framework that guarantees AI systems satisfy user-defined safety constraints with specified probabilities.<n>Our approach combines any AI model with an optimization problem that ensures outputs meet safety requirements while maintaining performance.<n>We prove our method guarantees probabilistic safety under mild conditions and establish the first scaling law in AI safety.
arXiv Detail & Related papers (2025-04-29T16:38:35Z) - Rethinking Technological Readiness in the Era of AI Uncertainty [0.0]
We argue that current technology readiness assessments fail to capture critical AI-specific factors.<n>We propose a new AI Readiness Framework to evaluate the maturity and trustworthiness of AI components in military systems.
arXiv Detail & Related papers (2025-04-15T14:09:50Z) - Trustworthiness in Stochastic Systems: Towards Opening the Black Box [1.7355698649527407]
behavior by an AI system threatens to undermine alignment and potential trust.<n>We take a philosophical perspective to the tension and potential conflict between foundationality and trustworthiness.<n>We propose latent value modeling for both AI systems and users to better assess alignment.
arXiv Detail & Related papers (2025-01-27T19:43:09Z) - Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.<n>We propose methods tailored to the unique properties of perception and decision-making.<n>We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.<n>Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.<n>It aggregates effects into a harmfulness score using 28 fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Trustworthy AI: From Principles to Practices [44.67324097900778]
Many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc.
In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems.
To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems.
arXiv Detail & Related papers (2021-10-04T03:20:39Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.