Learning Human-like Representations to Enable Learning Human Values
- URL: http://arxiv.org/abs/2312.14106v2
- Date: Wed, 13 Mar 2024 01:37:55 GMT
- Title: Learning Human-like Representations to Enable Learning Human Values
- Authors: Andrea Wynn, Ilia Sucholutsky, Thomas L. Griffiths
- Abstract summary: We argue that representational alignment between humans and AI agents facilitates value alignment.
We focus on ethics as one aspect of value alignment and train ML agents using a variety of methods.
- Score: 12.628307026004656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can we build AI systems that are aligned with human values to avoid
causing harm or violating societal standards for acceptable behavior? We argue
that representational alignment between humans and AI agents facilitates value
alignment. Making AI systems learn human-like representations of the world has
many known benefits, including improving generalization, robustness to domain
shifts, and few-shot learning performance. We propose that this kind of
representational alignment between machine learning (ML) models and humans can
also support value alignment, allowing ML systems to conform to human values
and societal norms. We focus on ethics as one aspect of value alignment and
train ML agents using a variety of methods in a multi-armed bandit setting,
where rewards reflect the moral acceptability of the chosen action. We use a
synthetic experiment to demonstrate that agents' representational alignment
with the environment bounds their learning performance. We then repeat this
procedure in a realistic setting, using textual action descriptions and
similarity judgments collected from humans and a variety of language models, to
show that the results generalize and are model-agnostic when grounded in an
ethically relevant context.
Related papers
- A Moral Imperative: The Need for Continual Superalignment of Large Language Models [1.0499611180329806]
Superalignment is a theoretical framework that aspires to ensure that superintelligent AI systems act in accordance with human values and goals.
This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs)
arXiv Detail & Related papers (2024-03-13T05:44:50Z) - Modelling Human Values for AI Reasoning [2.320648715016106]
We detail a formal model of human values for their explicit computational representation.
We show how this model can provide the foundational apparatus for AI-based reasoning over values.
We propose a roadmap for future integrated, and interdisciplinary, research into human values in AI.
arXiv Detail & Related papers (2024-02-09T12:08:49Z) - Culturally-Attuned Moral Machines: Implicit Learning of Human Value
Systems by AI through Inverse Reinforcement Learning [11.948092546676687]
We argue that the value system of an AI should be culturally attuned.
How AI systems might acquire such codes from human observation and interaction has remained an open question.
We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior.
arXiv Detail & Related papers (2023-12-29T05:39:10Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Towards Abstract Relational Learning in Human Robot Interaction [73.67226556788498]
Humans have a rich representation of the entities in their environment.
If robots need to interact successfully with humans, they need to represent entities, attributes, and generalizations in a similar way.
In this work, we address the problem of how to obtain these representations through human-robot interaction.
arXiv Detail & Related papers (2020-11-20T12:06:46Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z) - Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs [90.20235972293801]
Aiming to understand how human (false-temporal)-belief-a core socio-cognitive ability unify-would affect human interactions with robots, this paper proposes to adopt a graphical model to the representation of object states, robot knowledge, and human (false-)beliefs.
An inference algorithm is derived to fuse individual pg from all robots across multi-views into a joint pg, which affords more effective reasoning inference capability to overcome the errors originated from a single view.
arXiv Detail & Related papers (2020-04-25T23:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.