Training Socially Aligned Language Models on Simulated Social
Interactions
- URL: http://arxiv.org/abs/2305.16960v3
- Date: Sat, 28 Oct 2023 09:02:39 GMT
- Title: Training Socially Aligned Language Models on Simulated Social
Interactions
- Authors: Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M.
Dai, Diyi Yang, Soroush Vosoughi
- Abstract summary: Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
- Score: 99.39979111807388
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social alignment in AI systems aims to ensure that these models behave
according to established societal values. However, unlike humans, who derive
consensus on value judgments through social interaction, current language
models (LMs) are trained to rigidly replicate their training corpus in
isolation, leading to subpar generalization in unfamiliar scenarios and
vulnerability to adversarial attacks. This work presents a novel training
paradigm that permits LMs to learn from simulated social interactions. In
comparison to existing methodologies, our approach is considerably more
scalable and efficient, demonstrating superior performance in alignment
benchmarks and human evaluations. This paradigm shift in the training of LMs
brings us a step closer to developing AI systems that can robustly and
accurately reflect societal norms and values.
Related papers
- Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents [6.1923703280119105]
This paper proposes an innovative simulation method for the dynamics of social media user opinions.
The FDE-LLM algorithm incorporates opinion dynamics and epidemic model.
It categorizes users into opinion leaders and followers.
arXiv Detail & Related papers (2024-09-13T11:02:28Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents.
This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z) - Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents [18.961470450132637]
This paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions.
We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation.
arXiv Detail & Related papers (2024-02-19T18:00:53Z) - Learning Human-like Representations to Enable Learning Human Values [12.628307026004656]
We argue that representational alignment between humans and AI agents facilitates value alignment.
We focus on ethics as one aspect of value alignment and train ML agents using a variety of methods.
arXiv Detail & Related papers (2023-12-21T18:31:33Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Survey of Social Bias in Vision-Language Models [65.44579542312489]
Survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.
The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models.
arXiv Detail & Related papers (2023-09-24T15:34:56Z) - Rethinking Model Evaluation as Narrowing the Socio-Technical Gap [34.08410116336628]
We argue that model evaluation practices must take on a critical task to cope with the challenges and responsibilities brought by this homogenization.
We urge the community to develop evaluation methods based on real-world socio-requirements.
arXiv Detail & Related papers (2023-06-01T00:01:43Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.