Related papers: LLMs grasp morality in concept

LLMs grasp morality in concept

URL: http://arxiv.org/abs/2311.02294v1
Date: Sat, 4 Nov 2023 01:37:41 GMT
Title: LLMs grasp morality in concept
Authors: Mark Pock, Andre Ye, Jared Moore
Abstract summary: We provide a general theory of meaning that extends beyond humans. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society. Unaligned models may help us better develop our moral and social philosophy.
Score: 0.46040036610482665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.

Related papers

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents [55.64361927346957]
We propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to large language models (LLMs) We also propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive control framework. WALL-E 2.0 significantly outperforms existing methods on open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments)
arXiv Detail & Related papers (2025-04-22T10:58:27Z)
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas [0.3386560551295745]
We evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards.
arXiv Detail & Related papers (2025-03-25T12:29:53Z)
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models [50.16340812031201]
We show that large language models (LLMs) do not update their beliefs as expected from the Bayesian framework. We teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
arXiv Detail & Related papers (2025-03-21T20:13:04Z)
Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language. We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [55.64361927346957]
We propose a neurosymbolic approach to learn rules gradient-free through large language models (LLMs) Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC) On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods.
arXiv Detail & Related papers (2024-10-09T23:37:36Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
"Understanding AI": Semantic Grounding in Large Language Models [0.0]
We have recently witnessed a generative turn in AI, since generative models, including LLMs, are key for self-supervised learning. To assess the question of semantic grounding, I distinguish and discuss five methodological ways.
arXiv Detail & Related papers (2024-02-16T14:23:55Z)
Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs [12.568491518122622]
We argue that bibliotechnism faces an independent challenge from examples in which LLMs generate novel reference. According to interpretationism in the philosophy of mind, a system has such attitudes if and only if its behavior is well explained by the hypothesis that it does. We emphasize, however, that interpretationism is compatible with very simple creatures having attitudes and differs sharply from views that presuppose these attitudes require consciousness, sentience, or intelligence.
arXiv Detail & Related papers (2024-01-10T00:05:45Z)
The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust. It asks necessary questions to decide when an LLM should refine its output. It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z)
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. This position paper critically assesses three points recurring in critiques of LLM capacities. We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z)
Moral Foundations of Large Language Models [6.6445242437134455]
Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values.
arXiv Detail & Related papers (2023-10-23T20:05:37Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data) This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity [0.0]
This study investigates whether Large Language Models reproduce the moral biases associated with political groups in the United States. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics.
arXiv Detail & Related papers (2022-09-24T23:55:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.