Value Internalization: Learning and Generalizing from Social Reward
- URL: http://arxiv.org/abs/2407.14681v1
- Date: Fri, 19 Jul 2024 21:53:33 GMT
- Title: Value Internalization: Learning and Generalizing from Social Reward
- Authors: Frieda Rong, Max Kleiman-Weiner,
- Abstract summary: We propose a model of value internalization where social feedback trains an internal social reward (ISR) model.
We show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks.
Our work provides a foundation for understanding how humans acquire and generalize values and offers insights for aligning AI with human values.
- Score: 2.1933612703101764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social rewards shape human behavior. During development, a caregiver guides a learner's behavior towards culturally aligned goals and values. How do these behaviors persist and generalize when the caregiver is no longer present, and the learner must continue autonomously? Here, we propose a model of value internalization where social feedback trains an internal social reward (ISR) model that generates internal rewards when social rewards are unavailable. Through empirical simulations, we show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks. We characterize the implications of incomplete internalization, akin to "reward hacking" on the ISR. Additionally, we show that our model internalizes prosocial behavior in a multi-agent environment. Our work provides a foundation for understanding how humans acquire and generalize values and offers insights for aligning AI with human values.
Related papers
- FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas [23.26678104324838]
We introduced FairMindSim, which simulates the moral dilemma through a series of unfair scenarios.
We used LLM agents to simulate human behavior, ensuring alignment across various stages.
Our findings indicate that, behaviorally, GPT-4o exhibits a stronger sense of social justice, while humans display a richer range of emotions.
arXiv Detail & Related papers (2024-10-14T11:39:05Z) - Modelling Human Values for AI Reasoning [2.320648715016106]
We detail a formal model of human values for their explicit computational representation.
We show how this model can provide the foundational apparatus for AI-based reasoning over values.
We propose a roadmap for future integrated, and interdisciplinary, research into human values in AI.
arXiv Detail & Related papers (2024-02-09T12:08:49Z) - Culturally-Attuned Moral Machines: Implicit Learning of Human Value
Systems by AI through Inverse Reinforcement Learning [11.948092546676687]
We argue that the value system of an AI should be culturally attuned.
How AI systems might acquire such codes from human observation and interaction has remained an open question.
We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior.
arXiv Detail & Related papers (2023-12-29T05:39:10Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Flexible social inference facilitates targeted social learning when
rewards are not observable [58.762004496858836]
Groups coordinate more effectively when individuals are able to learn from others' successes.
We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable trajectories of behavior.
arXiv Detail & Related papers (2022-12-01T21:04:03Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - Social Chemistry 101: Learning to Reason about Social and Moral Norms [73.23298385380636]
We present Social Chemistry, a new conceptual formalism to study people's everyday social norms and moral judgments.
Social-Chem-101 is a large-scale corpus that catalogs 292k rules-of-thumb.
Our model framework, Neural Norm Transformer, learns and generalizes Social-Chem-101 to successfully reason about previously unseen situations.
arXiv Detail & Related papers (2020-11-01T20:16:45Z) - Emergent Social Learning via Multi-agent Reinforcement Learning [91.57176641192771]
Social learning is a key component of human and animal intelligence.
This paper investigates whether independent reinforcement learning agents can learn to use social learning to improve their performance.
arXiv Detail & Related papers (2020-10-01T17:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.