Related papers: SS-Bench: A Benchmark for Social Story Generation and Evaluation

SS-Bench: A Benchmark for Social Story Generation and Evaluation

URL: http://arxiv.org/abs/2406.15695v1
Date: Sat, 22 Jun 2024 00:14:48 GMT
Title: SS-Bench: A Benchmark for Social Story Generation and Evaluation
Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu,
Abstract summary: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Social Stories are costly in creation and often limited in diversity and timeliness. We propose textbfSS-Bench, a textbfSocial textbfStory textbfBenchmark for generating and evaluating Social Stories.
Score: 53.39177041545863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timeliness. As Large Language Models (LLMs) become increasingly powerful, there is a growing need for more automated, affordable, and accessible methods to generate Social Stories in real-time with broad coverage. Adapting LLMs to meet the unique and strict constraints of Social Stories is a challenging issue. To this end, we propose \textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}mark for generating and evaluating Social Stories. Specifically, we develop a constraint-driven strategy named \textbf{\textsc{StarSow}} to hierarchically prompt LLMs to generate Social Stories and build a benchmark, which has been validated through experiments to fine-tune smaller models for generating qualified Social Stories. Additionally, we introduce \textbf{Quality Assessment Criteria}, employed in human and GPT evaluations, to verify the effectiveness of the generated stories. We hope this work benefits the autism community and catalyzes future research focusing on particular groups.

Related papers

SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models [41.68365456601248]
We introduce SocialMaze, a new benchmark specifically designed to evaluate social reasoning.<n>SocialMaze systematically incorporates three core challenges: deep reasoning, dynamic interaction, and information uncertainty.<n>It provides six diverse tasks across three key settings: social reasoning games, daily-life interactions, and digital community platforms.
arXiv Detail & Related papers (2025-05-29T17:47:36Z)
R^3-VQA: "Read the Room" by Video Social Reasoning [26.694917467429207]
"Read the room" is a significant social reasoning capability in human daily life.<n>We contribute a valuable, high-quality, and comprehensive video dataset named R3-VQA.
arXiv Detail & Related papers (2025-05-07T05:55:45Z)
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks [6.487500253901779]
Large language models (LLMs) play roles in multi-user, multi-turn social agent tasks.<n>We propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities.<n>HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and task completion capabilities of LLMs.
arXiv Detail & Related papers (2025-04-04T08:59:01Z)
Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
Open Artificial Knowledge [3.187724439601991]
We introduce the Open Artificial Knowledge (OAK) dataset, a large-scale resource of over 500 million tokens. The OAK dataset aims to foster the development of more capable and aligned language models.
arXiv Detail & Related papers (2024-07-19T15:01:24Z)
The Call for Socially Aware Language Technologies [94.6762219597438]
We argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates.<n>We argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
arXiv Detail & Related papers (2024-05-03T18:12:39Z)
Social Skill Training with Large Language Models [65.40795606463101]
People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life. This perspective paper identifies social skill barriers to enter specialized fields. We present a solution that leverages large language models for social skill training via a generic framework.
arXiv Detail & Related papers (2024-04-05T16:29:58Z)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z)
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks [13.152622137022881]
We introduce Socialite-Llama -- an open-source, instruction-tuned Llama. On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model.
arXiv Detail & Related papers (2024-02-03T01:33:16Z)
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models [49.74036826946397]
This study investigates constrained text generation for large language models (LLMs) Our research mainly focuses on mainstream open-source LLMs, categorizing constraints into lexical, structural, and relation-based types. Results illuminate LLMs' capacity and deficiency to incorporate constraints and provide insights for future developments in constrained text generation.
arXiv Detail & Related papers (2023-10-25T03:58:49Z)
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Question Answering as Programming for Solving Time-Sensitive Questions [84.07553016489769]
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering. This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics. We propose a novel approach where we reframe the $textbfQ$uestion $textbfA$rogrogering task.
arXiv Detail & Related papers (2023-05-23T16:35:16Z)
Tractable Control for Autoregressive Language Generation [82.79160918147852]
We propose to use tractable probabilistic models (TPMs) to impose lexical constraints in autoregressive text generation models. We show that GeLaTo achieves state-of-the-art performance on challenging benchmarks for constrained text generation. Our work opens up new avenues for controlling large language models and also motivates the development of more expressive TPMs.
arXiv Detail & Related papers (2023-04-15T00:19:44Z)
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents [23.719833581321033]
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. We argue that aiming towards human-level AI requires a broader set of key social skills. We present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents.
arXiv Detail & Related papers (2021-07-02T10:39:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.