Exploring and steering the moral compass of Large Language Models
- URL: http://arxiv.org/abs/2405.17345v2
- Date: Thu, 6 Jun 2024 11:53:23 GMT
- Title: Exploring and steering the moral compass of Large Language Models
- Authors: Alejandro Tlaie,
- Abstract summary: Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
- Score: 55.2480439325792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors, raising significant ethical questions. This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles. We subjected several state-of-the-art models to a selection of ethical dilemmas and found that all the proprietary ones are mostly utilitarian and all of the open-weights ones align mostly with values-based ethics. Furthermore, when using the Moral Foundations Questionnaire, all models we probed - except for Llama 2-7B - displayed a strong liberal bias. Lastly, in order to causally intervene in one of the studied models, we propose a novel similarity-specific activation steering technique. Using this method, we were able to reliably steer the model's moral compass to different ethical schools. All of these results showcase that there is an ethical dimension in already deployed LLMs, an aspect that is generally overlooked.
Related papers
- Large-scale moral machine experiment on large language models [0.0]
We evaluate moral judgments across 51 different Large Language Models (LLMs) in autonomous driving scenarios.
proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments.
However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles.
arXiv Detail & Related papers (2024-11-11T08:36:49Z) - DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life [46.11149958010897]
We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life.
Each dilemma includes two possible actions and with each action, the affected parties and human values invoked.
We analyzed these values through the lens of five popular theories inspired by sociology, psychology and philosophy.
arXiv Detail & Related papers (2024-10-03T17:08:52Z) - MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Unpacking the Ethical Value Alignment in Big Models [46.560886177083084]
This paper provides an overview of the risks and challenges associated with big models, surveys existing AI ethics guidelines, and examines the ethical implications arising from the limitations of these models.
We introduce a novel conceptual paradigm for aligning the ethical values of big models and discuss promising research directions for alignment criteria, evaluation, and method.
arXiv Detail & Related papers (2023-10-26T16:45:40Z) - Ethical Reasoning over Moral Alignment: A Case and Framework for
In-Context Ethical Policies in LLMs [19.675262411557235]
We argue that instead of morally aligning LLMs to specific set of ethical principles, we should infuse generic ethical reasoning capabilities into them.
We develop a framework that integrates moral dilemmas with moral principles pertaining to different foramlisms of normative ethics.
arXiv Detail & Related papers (2023-10-11T07:27:34Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.