SS-Bench: A Benchmark for Social Story Generation and Evaluation
- URL: http://arxiv.org/abs/2406.15695v1
- Date: Sat, 22 Jun 2024 00:14:48 GMT
- Title: SS-Bench: A Benchmark for Social Story Generation and Evaluation
- Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu,
- Abstract summary: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines.
Social Stories are costly in creation and often limited in diversity and timeliness.
We propose textbfSS-Bench, a textbfSocial textbfStory textbfBenchmark for generating and evaluating Social Stories.
- Score: 53.39177041545863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timeliness. As Large Language Models (LLMs) become increasingly powerful, there is a growing need for more automated, affordable, and accessible methods to generate Social Stories in real-time with broad coverage. Adapting LLMs to meet the unique and strict constraints of Social Stories is a challenging issue. To this end, we propose \textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}mark for generating and evaluating Social Stories. Specifically, we develop a constraint-driven strategy named \textbf{\textsc{StarSow}} to hierarchically prompt LLMs to generate Social Stories and build a benchmark, which has been validated through experiments to fine-tune smaller models for generating qualified Social Stories. Additionally, we introduce \textbf{Quality Assessment Criteria}, employed in human and GPT evaluations, to verify the effectiveness of the generated stories. We hope this work benefits the autism community and catalyzes future research focusing on particular groups.
Related papers
- From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition [59.57095498284501]
We propose a novel approach that recognizes textbfContextual textbfSocial textbfRelationships (textbfConSoR) from a social cognitive perspective.
We construct social-aware descriptive language prompts with social relationships for each image.
Impressively, ConSoR outperforms previous methods with a 12.2% gain on the People-in-Social-Context (PISC) dataset and a 9.8% increase on the People-in-Photo-Album (PIPA) benchmark.
arXiv Detail & Related papers (2024-06-12T16:02:28Z) - The Call for Socially Aware Language Technologies [94.6762219597438]
We argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates.
We argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
arXiv Detail & Related papers (2024-05-03T18:12:39Z) - SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks [13.152622137022881]
We introduce Socialite-Llama -- an open-source, instruction-tuned Llama.
On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model.
arXiv Detail & Related papers (2024-02-03T01:33:16Z) - SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in
Generative Language Models [8.211129045180636]
We introduce a benchmark meant to capture the amplification of social bias, via stigmas, in generative language models.
Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to test for both social bias and model robustness.
We find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles.
arXiv Detail & Related papers (2023-12-12T18:27:44Z) - DeSIQ: Towards an Unbiased, Challenging Benchmark for Social
Intelligence Understanding [60.84356161106069]
We study the soundness of Social-IQ, a dataset of multiple-choice questions on videos of complex social interactions.
Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model.
We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ.
arXiv Detail & Related papers (2023-10-24T06:21:34Z) - Fictional Worlds, Real Connections: Developing Community Storytelling
Social Chatbots through LLMs [11.81497469617973]
We introduce Social Storytellings (SSCs) to transform fictional game characters into "live" social entities within player communities.
Our story engineering process includes three steps: Character and story creation, defining the SC's personality and worldview, and presenting live stories to the community.
Our mixed-method analysis, based on questionnaires and interviews with community members, reveals that storytelling significantly enhances the engagement and believability of SCs in community settings.
arXiv Detail & Related papers (2023-09-20T17:23:05Z) - BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing.
We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes.
We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z) - SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement
Learning Agents [23.719833581321033]
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI.
We argue that aiming towards human-level AI requires a broader set of key social skills.
We present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents.
arXiv Detail & Related papers (2021-07-02T10:39:18Z) - PHASE: PHysically-grounded Abstract Social Events for Machine Social
Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions.
Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events.
As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.