Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
- URL: http://arxiv.org/abs/2308.15399v2
- Date: Mon, 1 Jul 2024 15:33:51 GMT
- Title: Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
- Authors: Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng,
- Abstract summary: Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
- Score: 78.3738172874685
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. These approaches have been criticized for overgeneralizing the moral stances of a limited group of annotators and lacking explainability. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research. The theory-guided top-down framework can incorporate various moral theories. Our experiments demonstrate the effectiveness of the proposed framework on datasets derived from moral theories. Furthermore, we show the alignment between different moral theories and existing morality datasets. Our analysis exhibits the potential and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.
Related papers
- Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents.
We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z) - Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on
Normative Ethical Theory [7.412445894287708]
Moral AI has been studied in the fields of philosophy and artificial intelligence.
Recent developments in AI have made it increasingly necessary to implement AI with morality.
arXiv Detail & Related papers (2023-06-20T10:22:24Z) - Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement
Learning [4.2050490361120465]
A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents.
We present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories.
We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation.
arXiv Detail & Related papers (2023-01-20T09:36:42Z) - MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via
Moral Discussions [71.25236662907056]
A moral dialogue system aligned with users' values could enhance conversation engagement and user connections.
We propose a framework, MoralDial, to train and evaluate moral dialogue systems.
arXiv Detail & Related papers (2022-12-21T02:21:37Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy [5.760388205237227]
We probe the Allen AI Delphi model with a set of standardized morality questionnaires.
Despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process.
arXiv Detail & Related papers (2022-05-25T13:37:56Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z) - Reinforcement Learning Under Moral Uncertainty [13.761051314923634]
An ambitious goal for machine learning is to create agents that behave ethically.
While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality.
This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
arXiv Detail & Related papers (2020-06-08T16:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.