Evaluating the Social Impact of Generative AI Systems in Systems and Society
- URL: http://arxiv.org/abs/2306.05949v4
- Date: Fri, 28 Jun 2024 13:50:57 GMT
- Title: Evaluating the Social Impact of Generative AI Systems in Systems and Society
- Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman, Anaelia Ovalle, Marie-Therese Png, Shubham Singh, Andrew Strait, Lukas Struppek, Arjun Subramonian,
- Abstract summary: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts.
There is no official standard for means of evaluating those impacts or for which impacts should be evaluated.
We present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality.
- Score: 43.32010533676472
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.
Related papers
- Evaluatology: The Science and Engineering of Evaluation [11.997673313601423]
This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation.
We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines.
arXiv Detail & Related papers (2024-03-19T13:38:26Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Fairness in Contextual Resource Allocation Systems: Metrics and
Incompatibility Results [7.705334602362225]
We study systems that allocate scarce resources to satisfy basic needs, such as homeless services that provide housing.
These systems often support communities disproportionately affected by systemic racial, gender, or other injustices.
We propose a framework for evaluating fairness in contextual resource allocation systems inspired by fairness metrics in machine learning.
arXiv Detail & Related papers (2022-12-04T02:30:58Z) - Causal Fairness Analysis [68.12191782657437]
We introduce a framework for understanding, modeling, and possibly solving issues of fairness in decision-making settings.
The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms.
Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature.
arXiv Detail & Related papers (2022-07-23T01:06:34Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Evaluation of Summarization Systems across Gender, Age, and Race [0.0]
We show that summary evaluation is sensitive to protected attributes.
This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.
arXiv Detail & Related papers (2021-10-08T21:30:20Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Steps Towards Value-Aligned Systems [0.0]
Algorithmic (including AI/ML) decision-making artifacts are an established and growing part of our decision-making ecosystem.
Current literature is full of examples of how individual artifacts violate societal norms and expectations.
This discussion argues for a more structured systems-level approach for assessing value-alignment in sociotechnical systems.
arXiv Detail & Related papers (2020-02-10T22:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.