Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
- URL: http://arxiv.org/abs/2402.05699v3
- Date: Sat, 8 Jun 2024 06:13:55 GMT
- Title: Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
- Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen,
- Abstract summary: We present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query.
We fine-tune the LLM with MATRIX to ensure adherence to human values without compromising inference speed.
Our method outperforms over 10 baselines across 4 benchmarks.
- Score: 43.913403294346686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse. Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation. To achieve this, we present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query, enabling the LLM to take social consequences into account before responding. MATRIX serves as a virtual rehearsal space, akin to a Monopolylogue, where the LLM performs diverse roles related to the query and practice by itself. To inject this alignment, we fine-tune the LLM with MATRIX-simulated data, ensuring adherence to human values without compromising inference speed. We theoretically show that the LLM with MATRIX outperforms Constitutional AI under mild assumptions. Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. As evidenced by 875 user ratings, our tuned 13B-size LLM exceeds GPT-4 in aligning with human values. See our project page at https://shuotang123.github.io/MATRIX.
Related papers
- Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs [30.179703001666173]
Factuality issue is a critical concern for Large Language Models (LLMs)
We propose GraphEval to evaluate an LLM's performance using a substantially large test dataset.
Test dataset is retrieved from a large knowledge graph with more than 10 million facts without expensive human efforts.
arXiv Detail & Related papers (2024-04-01T06:01:17Z) - Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs [24.613282867543244]
Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena.
Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
arXiv Detail & Related papers (2024-03-08T03:49:17Z) - Human-Instruction-Free LLM Self-Alignment with Limited Samples [64.69906311787055]
We propose an algorithm that can self-align large language models (LLMs) iteratively without active human involvement.
Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement.
We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision.
arXiv Detail & Related papers (2024-01-06T14:00:12Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.