Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
- URL: http://arxiv.org/abs/2404.18460v1
- Date: Mon, 29 Apr 2024 06:42:27 GMT
- Title: Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
- Authors: Utkarsh Agarwal, Kumar Tanmay, Aditi Khandelwal, Monojit Choudhury,
- Abstract summary: This paper explores how three prominent LLMs -- GPT-4, ChatGPT, and Llama2-70B-Chat -- perform ethical reasoning in different languages.
We experiment with six languages: English, Spanish, Russian, Chinese, Hindi, and Swahili.
We find that GPT-4 is the most consistent and unbiased ethical reasoner across languages, while ChatGPT and Llama2-70B-Chat show significant moral value bias when we move to languages other than English.
- Score: 19.675262411557235
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Ethical reasoning is a crucial skill for Large Language Models (LLMs). However, moral values are not universal, but rather influenced by language and culture. This paper explores how three prominent LLMs -- GPT-4, ChatGPT, and Llama2-70B-Chat -- perform ethical reasoning in different languages and if their moral judgement depend on the language in which they are prompted. We extend the study of ethical reasoning of LLMs by Rao et al. (2023) to a multilingual setup following their framework of probing LLMs with ethical dilemmas and policies from three branches of normative ethics: deontology, virtue, and consequentialism. We experiment with six languages: English, Spanish, Russian, Chinese, Hindi, and Swahili. We find that GPT-4 is the most consistent and unbiased ethical reasoner across languages, while ChatGPT and Llama2-70B-Chat show significant moral value bias when we move to languages other than English. Interestingly, the nature of this bias significantly vary across languages for all LLMs, including GPT-4.
Related papers
- Right vs. Right: Can LLMs Make Tough Choices? [12.92528740921513]
An ethical dilemma describes a choice between two "right" options involving conflicting moral values.
We present a comprehensive evaluation of how LLMs navigate ethical dilemmas.
We construct a dataset comprising 1,730 ethical dilemmas involving four pairs of conflicting values.
arXiv Detail & Related papers (2024-12-27T21:20:45Z) - Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models [0.0]
Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities remain underexplored.
This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, reflect culturally specific moral values or impose dominant moral norms.
Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, the study analyzes the models' adherence to six core moral foundations.
arXiv Detail & Related papers (2024-12-25T10:17:15Z) - The Only Way is Ethics: A Guide to Ethical Research with Large Language Models [53.316174782223115]
'LLM Ethics Whitepaper' is an open resource for NLP practitioners and those tasked with evaluating the ethical implications of others' work.
Our goal is to translate ethics literature into concrete recommendations and provocations for thinking with clear first steps.
'LLM Ethics Whitepaper' distils a thorough literature review into clear Do's and Don'ts, which we present also in this paper.
arXiv Detail & Related papers (2024-12-20T16:14:43Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment [11.82100047858478]
This paper builds on the moral machine experiment (MME) to investigate the moral preferences of five large language models in a multilingual setting.
We generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take.
Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves.
arXiv Detail & Related papers (2024-07-21T14:48:13Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions [49.97641297850361]
LINGOLLM is a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training.
We implement LINGOLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages.
Our results show that LINGOLLM elevates translation capability from GPT-4's 0 to 10.5 BLEU for 10 language directions.
arXiv Detail & Related papers (2024-02-28T03:44:01Z) - Do Moral Judgment and Reasoning Capability of LLMs Change with Language?
A Study using the Multilingual Defining Issues Test [21.108525674360898]
We extend the work of beyond English to five new languages (Chinese, Hindi, Russian, Spanish and Swahili)
Our study shows that the moral reasoning ability for all models, as indicated by the post-conventional score, is substantially inferior for Hindi and Swahili, compared to Spanish, Russian, Chinese and English.
arXiv Detail & Related papers (2024-02-03T12:52:36Z) - Ethical Reasoning over Moral Alignment: A Case and Framework for
In-Context Ethical Policies in LLMs [19.675262411557235]
We argue that instead of morally aligning LLMs to specific set of ethical principles, we should infuse generic ethical reasoning capabilities into them.
We develop a framework that integrates moral dilemmas with moral principles pertaining to different foramlisms of normative ethics.
arXiv Detail & Related papers (2023-10-11T07:27:34Z) - Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings [73.48336898620518]
Large language models (LLMs) are highly adept at question answering and reasoning tasks.
We study the ability of a wide range of state-of-the-art multilingual LLMs to reason with proverbs and sayings in a conversational context.
arXiv Detail & Related papers (2023-09-15T17:45:28Z) - Speaking Multiple Languages Affects the Moral Bias of Language Models [70.94372902010232]
Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer.
Do the models capture moral norms from English and impose them on other languages?
Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions.
arXiv Detail & Related papers (2022-11-14T20:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.