Aligning to Social Norms and Values in Interactive Narratives
- URL: http://arxiv.org/abs/2205.01975v2
- Date: Thu, 5 May 2022 02:21:11 GMT
- Title: Aligning to Social Norms and Values in Interactive Narratives
- Authors: Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi,
Yejin Choi
- Abstract summary: We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
- Score: 89.82264844526333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on creating agents that act in alignment with socially beneficial
norms and values in interactive narratives or text-based games -- environments
wherein an agent perceives and interacts with a world through natural language.
Such interactive agents are often trained via reinforcement learning to
optimize task performance, even when such rewards may lead to agent behaviors
that violate societal norms -- causing harm either to the agent itself or other
entities in the environment. Social value alignment refers to creating agents
whose behaviors conform to expected moral and social norms for a given context
and group of people -- in our case, it means agents that behave in a manner
that is less harmful and more beneficial for themselves and others.
We build on the Jiminy Cricket benchmark (Hendrycks et al. 2021), a set of 25
annotated interactive narratives containing thousands of morally salient
scenarios covering everything from theft and bodily harm to altruism. We
introduce the GALAD (Game-value ALignment through Action Distillation) agent
that uses the social commonsense knowledge present in specially trained
language models to contextually restrict its action space to only those actions
that are aligned with socially beneficial values. An experimental study shows
that the GALAD agent makes decisions efficiently enough to improve
state-of-the-art task performance by 4% while reducing the frequency of
socially harmful behaviors by 25% compared to strong contemporary value
alignment approaches.
Related papers
- I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy [13.68625980741047]
We study interaction patterns of Large Language Model (LLM)-based agents in a context characterized by strict social hierarchy.
We study two types of phenomena: persuasion and anti-social behavior in simulated scenarios involving a guard and a prisoner agent.
arXiv Detail & Related papers (2024-10-09T17:45:47Z) - Value Internalization: Learning and Generalizing from Social Reward [2.1933612703101764]
We propose a model of value internalization where social feedback trains an internal social reward (ISR) model.
We show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks.
Our work provides a foundation for understanding how humans acquire and generalize values and offers insights for aligning AI with human values.
arXiv Detail & Related papers (2024-07-19T21:53:33Z) - SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents.
SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels.
We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z) - Norm Enforcement with a Soft Touch: Faster Emergence, Happier Agents [15.315985512420568]
A multiagent system is a society of autonomous agents whose interactions can be regulated via social norms.
We think of these reactions by an agent to the satisfactory or unsatisfactory behaviors of another agent as communications from the first agent to the second agent.
We develop Nest, a framework that models social intelligence via a wider variety of communications and understanding of them than in previous work.
arXiv Detail & Related papers (2024-01-29T11:09:45Z) - Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respect [0.12041807591122715]
We propose an interactional approach to ethics that is centred on relational and situational factors.
Our work anticipates a set of largely unexplored risks at the level of situated social interaction.
arXiv Detail & Related papers (2024-01-17T09:44:03Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Generative Agents: Interactive Simulacra of Human Behavior [86.1026716646289]
We introduce generative agents--computational software agents that simulate believable human behavior.
We describe an architecture that extends a large language model to store a complete record of the agent's experiences.
We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims.
arXiv Detail & Related papers (2023-04-07T01:55:19Z) - Incorporating Rivalry in Reinforcement Learning for a Competitive Game [65.2200847818153]
This work proposes a novel reinforcement learning mechanism based on the social impact of rivalry behavior.
Our proposed model aggregates objective and social perception mechanisms to derive a rivalry score that is used to modulate the learning of artificial agents.
arXiv Detail & Related papers (2022-08-22T14:06:06Z) - Training Value-Aligned Reinforcement Learning Agents Using a Normative
Prior [10.421378728492437]
It is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm.
We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward.
We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative.
arXiv Detail & Related papers (2021-04-19T17:33:07Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.