Related papers: SocialNLI: A Dialogue-Centric Social Inference Dataset

SocialNLI: A Dialogue-Centric Social Inference Dataset

URL: http://arxiv.org/abs/2510.05458v1
Date: Mon, 06 Oct 2025 23:42:01 GMT
Title: SocialNLI: A Dialogue-Centric Social Inference Dataset
Authors: Akhil Deo, Kate Sanders, Benjamin Van Durme,
Abstract summary: We introduce SocialNLI -- the first social dialogue inference dataset.<n>SocialNLI consists of a collection of dialogue transcripts hand-picked to center complex social nuances.<n>We evaluate reasoning models theory-of-mind ability through multi-step counterfactual reasoning.
Score: 49.60157928163403
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Making theory-of-mind inferences from human dialogue is a strong indicator of a model's underlying social abilities, which are fundamental for adept AI assistants. However, large language and reasoning models struggle to understand sophisticated social phenomena in transcript data, such as sarcasm and irony. To assess the weaknesses of current models and to identify their solutions, we introduce SocialNLI (SoNLI) -- the first social dialogue inference dataset. SoNLI consists of a collection of dialogue transcripts hand-picked to center complex social nuances like irony and sarcasm, paired with inferences, corresponding likelihood scores, and human-written explanations. We explore social inference analysis as a facet of theory-of-mind, and evaluate LLM and reasoning model theory-of-mind ability through multi-step counterfactual reasoning.

Related papers

Social Caption: Evaluating Social Understanding in Multimodal Models [23.008965893705767]
Social understanding abilities are crucial for multimodal large language models (MLLMs) to interpret human social interactions.<n>We introduce Social Caption, a framework grounded in interaction theory to evaluate social understanding abilities of MLLMs.<n>We analyze factors influencing model performance in social understanding, such as scale, architectural design, and spoken context.
arXiv Detail & Related papers (2026-01-21T01:10:42Z)
Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z)
SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning [53.16179295245888]
We introduce SIV-Bench, a novel video benchmark for evaluating the capabilities of Multimodal Large Language Models (MLLMs) across Social Scene Understanding (SSU), Social State Reasoning (SSR), and Social Dynamics Prediction (SDP)<n>SIV-Bench features 2,792 video clips and 8,792 meticulously generated question-answer pairs derived from a human-LLM collaborative pipeline.<n>It also includes a dedicated setup for analyzing the impact of different textual cues-original on-screen text, added dialogue, or no text.
arXiv Detail & Related papers (2025-06-05T05:51:35Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models [41.68365456601248]
We introduce SocialMaze, a new benchmark specifically designed to evaluate social reasoning.<n>SocialMaze systematically incorporates three core challenges: deep reasoning, dynamic interaction, and information uncertainty.<n>It provides six diverse tasks across three key settings: social reasoning games, daily-life interactions, and digital community platforms.
arXiv Detail & Related papers (2025-05-29T17:47:36Z)
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models [61.88413918026431]
Social reasoning abilities are crucial for AI systems to interpret and respond to multimodal human communication and interaction within social contexts.<n>We introduce SOCIAL GENOME, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models.
arXiv Detail & Related papers (2025-02-21T00:05:40Z)
Social Orientation: A New Feature for Dialogue Analysis [15.192659799728181]
We introduce a new data set of dialogue utterances machine-labeled with social orientation tags. We show that social orientation tags improve task performance, especially in low-resource settings. We also demonstrate how social orientation tags help explain the outcomes of social interactions when used in neural models.
arXiv Detail & Related papers (2024-02-26T01:55:45Z)
SocialDial: A Benchmark for Socially-Aware Dialogue Systems [45.3266270265532]
We present the first socially-aware dialogue corpus - SocialDial, based on Chinese social culture. SocialDial consists of two parts: 1,563 multi-turn dialogues between two human speakers with fine-grained labels, and 4,870 synthetic conversations generated by ChatGPT. The human corpus covers five categories of social norms, which have 14 sub-categories in total.
arXiv Detail & Related papers (2023-04-24T11:55:22Z)
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box. We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.