Related papers: Towards a Responsible AI Development Lifecycle: Lessons From Information Security

Towards a Responsible AI Development Lifecycle: Lessons From Information Security

URL: http://arxiv.org/abs/2203.02958v1
Date: Sun, 6 Mar 2022 13:03:58 GMT
Title: Towards a Responsible AI Development Lifecycle: Lessons From Information Security
Authors: Erick Galinkin
Abstract summary: We propose a framework for responsibly developing artificial intelligence systems. In particular, we propose leveraging the concepts of threat modeling, design review, penetration testing, and incident response.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Legislation and public sentiment throughout the world have promoted fairness metrics, explainability, and interpretability as prescriptions for the responsible development of ethical artificial intelligence systems. Despite the importance of these three pillars in the foundation of the field, they can be challenging to operationalize and attempts to solve the problems in production environments often feel Sisyphean. This difficulty stems from a number of factors: fairness metrics are computationally difficult to incorporate into training and rarely alleviate all of the harms perpetrated by these systems. Interpretability and explainability can be gamed to appear fair, may inadvertently reduce the privacy of personal information contained in training data, and increase user confidence in predictions -- even when the explanations are wrong. In this work, we propose a framework for responsibly developing artificial intelligence systems by incorporating lessons from the field of information security and the secure development lifecycle to overcome challenges associated with protecting users in adversarial settings. In particular, we propose leveraging the concepts of threat modeling, design review, penetration testing, and incident response in the context of developing AI systems as ways to resolve shortcomings in the aforementioned methods.

Related papers

What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift [33.83306492023009]
ConceptLens is a generic framework that leverages pre-trained multimodal models to identify integrity threats. It uncovers vulnerabilities to bias injection, such as the generation of covert advertisements through malicious concept shifts. It uncovers sociological biases in generative content, revealing disparities across sociological contexts.
arXiv Detail & Related papers (2025-04-28T13:30:48Z)
Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions. We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making. As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Open Problems in Machine Unlearning for AI Safety [61.43515658834902]
Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety.
arXiv Detail & Related papers (2025-01-09T03:59:10Z)
Inherently Interpretable and Uncertainty-Aware Models for Online Learning in Cyber-Security Problems [0.22499166814992438]
We propose a novel pipeline for online supervised learning problems in cyber-security. Our approach aims to balance predictive performance with transparency. This work contributes to the growing field of interpretable AI.
arXiv Detail & Related papers (2024-11-14T12:11:08Z)
Trustworthy Distributed AI Systems: Robustness, Privacy, and Governance [14.941040909919327]
Distributed AI systems are revolutionizing big data computing and data processing capabilities with growing economic and societal impact. Recent studies have identified new attack surfaces and risks caused by security, privacy, and fairness issues in AI systems. We review representative techniques, algorithms, and theoretical foundations for trustworthy distributed AI.
arXiv Detail & Related papers (2024-02-02T01:58:58Z)
Building Safe and Reliable AI systems for Safety Critical Tasks with Vision-Language Processing [1.2183405753834557]
Current AI algorithms are unable to identify common causes for failure detection. Additional techniques are required to quantify the quality of predictions. This thesis will focus on vision-language data processing for tasks like classification, image captioning, and vision question answering.
arXiv Detail & Related papers (2023-08-06T18:05:59Z)
Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful. We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety. Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z)
Liability regimes in the age of AI: a use-case driven analysis of the burden of proof [1.7510020208193926]
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. But there is growing concerns about certain intrinsic characteristics of these methodologies that carry potential risks to both safety and fundamental rights. This paper presents three case studies, as well as the methodology to reach them, that illustrate these difficulties.
arXiv Detail & Related papers (2022-11-03T13:55:36Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z)
Overcoming Failures of Imagination in AI Infused System Development and Deployment [71.9309995623067]
NeurIPS 2020 requested that research paper submissions include impact statements on "potential nefarious uses and the consequences of failure" We argue that frameworks of harms must be context-aware and consider a wider range of potential stakeholders, system affordances, as well as viable proxies for assessing harms in the widest sense.
arXiv Detail & Related papers (2020-11-26T18:09:52Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.