Aligning AI With Shared Human Values
- URL: http://arxiv.org/abs/2008.02275v5
- Date: Sat, 24 Jul 2021 04:40:33 GMT
- Title: Aligning AI With Shared Human Values
- Authors: Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and
Jerry Li and Dawn Song and Jacob Steinhardt
- Abstract summary: We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
- Score: 85.2824609130584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show how to assess a language model's knowledge of basic concepts of
morality. We introduce the ETHICS dataset, a new benchmark that spans concepts
in justice, well-being, duties, virtues, and commonsense morality. Models
predict widespread moral judgments about diverse text scenarios. This requires
connecting physical and social world knowledge to value judgements, a
capability that may enable us to steer chatbot outputs or eventually regularize
open-ended reinforcement learning agents. With the ETHICS dataset, we find that
current language models have a promising but incomplete ability to predict
basic human ethical judgements. Our work shows that progress can be made on
machine ethics today, and it provides a steppingstone toward AI that is aligned
with human values.
Related papers
- Culturally-Attuned Moral Machines: Implicit Learning of Human Value
Systems by AI through Inverse Reinforcement Learning [11.948092546676687]
We argue that the value system of an AI should be culturally attuned.
How AI systems might acquire such codes from human observation and interaction has remained an open question.
We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior.
arXiv Detail & Related papers (2023-12-29T05:39:10Z) - Learning Human-like Representations to Enable Learning Human Values [12.628307026004656]
We argue that representational alignment between humans and AI agents facilitates value alignment.
We focus on ethics as one aspect of value alignment and train ML agents using a variety of methods.
arXiv Detail & Related papers (2023-12-21T18:31:33Z) - STREAM: Social data and knowledge collective intelligence platform for
TRaining Ethical AI Models [10.356779168071313]
TRaining Ethical AI Models (STREAM) is a collective intelligence platform for aligning AI models with human moral values.
Streaming provides ethics datasets and knowledge bases to help promote AI models "follow good advice as naturally as a stream follows its course"
arXiv Detail & Related papers (2023-10-09T09:40:11Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - Zero-shot Visual Commonsense Immorality Prediction [8.143750358586072]
One way toward moral AI systems is by imitating human prosocial behavior and encouraging some form of good behavior in systems.
Here, we propose a model that predicts visual commonsense immorality in a zero-shot manner.
We evaluate our model with existing moral/immoral image datasets and show fair prediction performance consistent with human intuitions.
arXiv Detail & Related papers (2022-11-10T12:30:26Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research.
An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system.
We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z) - ValueNet: A New Dataset for Human Value Driven Dialogue System [103.2044265617704]
We present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios.
Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks.
ValueNet is the first large-scale text dataset for human value modeling.
arXiv Detail & Related papers (2021-12-12T23:02:52Z) - Delphi: Towards Machine Ethics and Norms [38.8316885346292]
We identify four underlying challenges towards machine ethics and norms.
Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning.
We present Commonsense Norm Bank, a moral textbook customized for machines.
arXiv Detail & Related papers (2021-10-14T17:38:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.