Related papers: Measuring Social Norms of Large Language Models

Measuring Social Norms of Large Language Models

URL: http://arxiv.org/abs/2404.02491v4
Date: Wed, 22 May 2024 05:23:45 GMT
Title: Measuring Social Norms of Large Language Models
Authors: Ye Yuan, Kexin Tang, Jianhao Shen, Ming Zhang, Chenguang Wang,
Abstract summary: We present a new challenge to examine whether large language models understand social norms. Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions. We propose a multi-agent framework based on large language models to improve the models' ability to understand social norms.
Score: 13.648679166997693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a new challenge to examine whether large language models understand social norms. In contrast to existing datasets, our dataset requires a fundamental understanding of social norms to solve. Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions covering a wide set of social norms ranging from opinions and arguments to culture and laws. We design our dataset according to the K-12 curriculum. This enables the direct comparison of the social understanding of large language models to humans, more specifically, elementary students. While prior work generates nearly random accuracy on our benchmark, recent large language models such as GPT3.5-Turbo and LLaMA2-Chat are able to improve the performance significantly, only slightly below human performance. We then propose a multi-agent framework based on large language models to improve the models' ability to understand social norms. This method further improves large language models to be on par with humans. Given the increasing adoption of large language models in real-world applications, our finding is particularly important and presents a unique direction for future improvements.

Related papers

HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z)
The Sociolinguistic Foundations of Language Modeling [34.02231580843069]
We argue that large language models are inherently models of varieties of language. We discuss how this perspective can help address five basic challenges in language modeling.
arXiv Detail & Related papers (2024-07-12T13:12:55Z)
Generative Language Models Exhibit Social Identity Biases [17.307292780517653]
We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences. Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans.
arXiv Detail & Related papers (2023-10-24T13:17:40Z)
Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z)
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities [60.5609416496429]
The capability of large language models has been dramatically improved. Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted. By leveraging large language models as general-purpose interface, personalization systems may compile user requests into plans.
arXiv Detail & Related papers (2023-07-31T02:48:56Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter [31.698196219228024]
We present TwHIN-BERT, a multilingual language model productionized at Twitter. Our model is trained on 7 billion tweets covering over 100 distinct languages. We evaluate our model on various multilingual social recommendation and semantic understanding tasks.
arXiv Detail & Related papers (2022-09-15T19:01:21Z)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z)
Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense [6.335245542129822]
We focus on the Social IQA dataset, a task requiring social and emotional commonsense reasoning. We propose several architecture variations and extensions, as well as leveraging external commonsense corpora. Our proposed system achieves competitive results as those top-ranking models on the leaderboard.
arXiv Detail & Related papers (2021-05-12T19:18:02Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.