A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations
- URL: http://arxiv.org/abs/2501.02074v2
- Date: Thu, 13 Feb 2025 20:07:20 GMT
- Title: A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations
- Authors: Aida Davani, Sunipa Dev, Héctor Pérez-Urbina, Vinodkumar Prabhakaran,
- Abstract summary: Societal stereotypes are at the center of a myriad of responsible AI interventions.
We propose a unified framework to operationalize stereotypes in generative AI evaluations.
- Score: 15.381034360289899
- License:
- Abstract: Societal stereotypes are at the center of a myriad of responsible AI interventions targeted at reducing the generation and propagation of potentially harmful outcomes. While these efforts are much needed, they tend to be fragmented and often address different parts of the issue without taking in a unified or holistic approach about social stereotypes and how they impact various parts of the machine learning pipeline. As a result, it fails to capitalize on the underlying mechanisms that are common across different types of stereotypes, and to anchor on particular aspects that are relevant in certain cases. In this paper, we draw on social psychological research, and build on NLP data and methods, to propose a unified framework to operationalize stereotypes in generative AI evaluations. Our framework identifies key components of stereotypes that are crucial in AI evaluation, including the target group, associated attribute, relationship characteristics, perceiving group, and relevant context. We also provide considerations and recommendations for its responsible use.
Related papers
- The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question.
We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation.
We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z) - Aligning Generalisation Between Humans and Machines [74.120848518198]
Recent advances in AI have resulted in technology that can support humans in scientific discovery and decision support but may also disrupt democracies and target individuals.
The responsible use of AI increasingly shows the need for human-AI teaming.
A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalise.
arXiv Detail & Related papers (2024-11-23T18:36:07Z) - Causal Responsibility Attribution for Human-AI Collaboration [62.474732677086855]
This paper presents a causal framework using Structural Causal Models (SCMs) to systematically attribute responsibility in human-AI systems.
Two case studies illustrate the framework's adaptability in diverse human-AI collaboration scenarios.
arXiv Detail & Related papers (2024-11-05T17:17:45Z) - Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications.
New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks.
This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z) - GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives [18.574420136899978]
We propose GRASP, a comprehensive disagreement analysis framework to measure group association in perspectives among different rater sub-groups.
Our framework reveals specific rater groups that have significantly different perspectives than others on certain tasks, and helps identify demographic axes that are crucial to consider in specific task contexts.
arXiv Detail & Related papers (2023-11-09T00:12:21Z) - Anticipating Impacts: Using Large-Scale Scenario Writing to Explore
Diverse Implications of Generative AI in the News Environment [3.660182910533372]
We aim to broaden the perspective and capture the expectations of three stakeholder groups about the potential negative impacts of generative AI.
We apply scenario writing and use participatory foresight to delve into cognitively diverse imaginations of the future.
We conclude by discussing the usefulness of scenario-writing and participatory foresight as a toolbox for generative AI impact assessment.
arXiv Detail & Related papers (2023-10-10T06:59:27Z) - From human-centered to social-centered artificial intelligence: Assessing ChatGPT's impact through disruptive events [1.1858896428516252]
We argue that critiques of ChatGPT's impact in machine learning research communities have coalesced around its performance or other conventional safety evaluations relating to bias, toxicity, and "hallucination"
By analyzing ChatGPT's social impact through a social-centered framework, we challenge individualistic approaches in AI development and contribute to ongoing debates around the ethical and responsible deployment of AI systems.
arXiv Detail & Related papers (2023-05-31T22:46:48Z) - ACROCPoLis: A Descriptive Framework for Making Sense of Fairness [6.4686347616068005]
We propose the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects.
The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit.
arXiv Detail & Related papers (2023-04-19T21:14:57Z) - Fairness And Bias in Artificial Intelligence: A Brief Survey of Sources,
Impacts, And Mitigation Strategies [11.323961700172175]
This survey paper offers a succinct, comprehensive overview of fairness and bias in AI.
We review sources of bias, such as data, algorithm, and human decision biases.
We assess the societal impact of biased AI systems, focusing on the perpetuation of inequalities and the reinforcement of harmful stereotypes.
arXiv Detail & Related papers (2023-04-16T03:23:55Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.