The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing
- URL: http://arxiv.org/abs/2407.07786v2
- Date: Wed, 11 Sep 2024 16:02:31 GMT
- Title: The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing
- Authors: Alice Qian Zhang, Ryland Shaw, Jacy Reese Anthis, Ashlee Milton, Emily Tseng, Jina Suh, Lama Ahmad, Ram Shankar Siva Kumar, Julian Posada, Benjamin Shestakofsky, Sarah T. Roberts, Mary L. Gray,
- Abstract summary: Rapid progress in general-purpose AI has sparked significant interest in "red teaming"
Questions about how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers.
Future studies may explore topics ranging from fairness to mental health and other areas of potential harm.
- Score: 4.933252611303578
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing body of HCI and CSCW literature examines related practices-including data labeling, content moderation, and algorithmic auditing. However, few, if any have investigated red teaming itself. Future studies may explore topics ranging from fairness to mental health and other areas of potential harm. We aim to facilitate a community of researchers and practitioners who can begin to meet these challenges with creativity, innovation, and thoughtful reflection.
Related papers
- Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications.
New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks.
This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z) - Open Challenges on Fairness of Artificial Intelligence in Medical Imaging Applications [3.8236840661885485]
The chapter first discusses various sources of bias, including data collection, model training, and clinical deployment.
We then turn to discussing open challenges that we believe require attention from researchers and practitioners.
arXiv Detail & Related papers (2024-07-24T02:41:19Z) - Red-Teaming for Generative AI: Silver Bullet or Security Theater? [42.35800543892003]
We argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI.
To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.
arXiv Detail & Related papers (2024-01-29T05:46:14Z) - The Promise and Peril of Artificial Intelligence -- Violet Teaming
Offers a Balanced Path Forward [56.16884466478886]
This paper reviews emerging issues with opaque and uncontrollable AI systems.
It proposes an integrative framework called violet teaming to develop reliable and responsible AI.
It emerged from AI safety research to manage risks proactively by design.
arXiv Detail & Related papers (2023-08-28T02:10:38Z) - Large Language Models are Fixated by Red Herrings: Exploring Creative
Problem Solving and Einstellung Effect using the Only Connect Wall Dataset [4.789429120223149]
The quest for human imitative AI has been an enduring topic in AI research since its inception.
Creative problem solving in humans is a well-studied topic in cognitive neuroscience.
Only Connect Wall segment essentially mimics Mednick's Remote Associates Test (RAT) formulation with built-in, deliberate red herrings.
arXiv Detail & Related papers (2023-06-19T21:14:57Z) - Capturing Humans' Mental Models of AI: An Item Response Theory Approach [12.129622383429597]
We show that people expect AI agents' performance to be significantly better on average than the performance of other humans.
Our results indicate that people expect AI agents' performance to be significantly better on average than the performance of other humans.
arXiv Detail & Related papers (2023-05-15T23:17:26Z) - Human-Centered Responsible Artificial Intelligence: Current & Future
Trends [76.94037394832931]
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence.
All of this work is aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI.
In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends.
arXiv Detail & Related papers (2023-02-16T08:59:42Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - From Psychological Curiosity to Artificial Curiosity: Curiosity-Driven
Learning in Artificial Intelligence Tasks [56.20123080771364]
Psychological curiosity plays a significant role in human intelligence to enhance learning through exploration and information acquisition.
In the Artificial Intelligence (AI) community, artificial curiosity provides a natural intrinsic motivation for efficient learning.
CDL has become increasingly popular, where agents are self-motivated to learn novel knowledge.
arXiv Detail & Related papers (2022-01-20T17:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.