Related papers: Evaluating the Social Impact of Generative AI Systems in Systems and Society

Evaluating the Social Impact of Generative AI Systems in Systems and Society

URL: http://arxiv.org/abs/2306.05949v4
Date: Fri, 28 Jun 2024 13:50:57 GMT
Title: Evaluating the Social Impact of Generative AI Systems in Systems and Society
Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman, Anaelia Ovalle, Marie-Therese Png, Shubham Singh, Andrew Strait, Lukas Struppek, Arjun Subramonian,
Abstract summary: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts. There is no official standard for means of evaluating those impacts or for which impacts should be evaluated. We present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality.
Score: 43.32010533676472
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.

Related papers

Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge [78.35388859345056]
We argue that the ML community would benefit from learning from and drawing on the social sciences when developing measurement instruments for evaluating GenAI systems.<n>We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI systems.
arXiv Detail & Related papers (2025-02-01T21:09:51Z)
A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations [15.381034360289899]
Societal stereotypes are at the center of a myriad of responsible AI interventions. We propose a unified framework to operationalize stereotypes in generative AI evaluations.
arXiv Detail & Related papers (2025-01-03T19:39:48Z)
Evaluating Generative AI Systems is a Social Science Measurement Challenge [78.35388859345056]
We present a framework for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems. The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves.
arXiv Detail & Related papers (2024-11-17T02:35:30Z)
Pessimistic Evaluation [58.736490198613154]
We argue that evaluating information access systems assumes utilitarian values not aligned with traditions of information access based on equal access. We advocate for pessimistic evaluation of information access systems focusing on worst case utility.
arXiv Detail & Related papers (2024-10-17T15:40:09Z)
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs [14.621675648356236]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review. We apply Value to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios.
arXiv Detail & Related papers (2024-09-15T02:13:03Z)
Evaluatology: The Science and Engineering of Evaluation [11.997673313601423]
This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines.
arXiv Detail & Related papers (2024-03-19T13:38:26Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Fairness in Contextual Resource Allocation Systems: Metrics and Incompatibility Results [7.705334602362225]
We study systems that allocate scarce resources to satisfy basic needs, such as homeless services that provide housing. These systems often support communities disproportionately affected by systemic racial, gender, or other injustices. We propose a framework for evaluating fairness in contextual resource allocation systems inspired by fairness metrics in machine learning.
arXiv Detail & Related papers (2022-12-04T02:30:58Z)
Towards a multi-stakeholder value-based assessment framework for algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values. We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z)
Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness. We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z)
Steps Towards Value-Aligned Systems [0.0]
Algorithmic (including AI/ML) decision-making artifacts are an established and growing part of our decision-making ecosystem. Current literature is full of examples of how individual artifacts violate societal norms and expectations. This discussion argues for a more structured systems-level approach for assessing value-alignment in sociotechnical systems.
arXiv Detail & Related papers (2020-02-10T22:47:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.