SVBench: Evaluation of Video Generation Models on Social Reasoning
- URL: http://arxiv.org/abs/2512.21507v1
- Date: Thu, 25 Dec 2025 04:44:59 GMT
- Title: SVBench: Evaluation of Video Generation Models on Social Reasoning
- Authors: Wenshuo Peng, Gongxuan Wang, Tianmeng Yang, Chuanhao Li, Xiaojie Xu, Hui He, Kaipeng Zhang,
- Abstract summary: We introduce the first benchmark for social reasoning in video generation.<n>We develop a fully training-free agent-based pipeline that distills the reasoning mechanism of each experiment.<n>We conduct the first large-scale study across seven state-of-the-art video generation systems.
- Score: 35.06131184286366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-video generation models exhibit remarkable progress in visual realism, motion fidelity, and text-video alignment, yet they remain fundamentally limited in their ability to generate socially coherent behavior. Unlike humans, who effortlessly infer intentions, beliefs, emotions, and social norms from brief visual cues, current models tend to render literal scenes without capturing the underlying causal or psychological logic. To systematically evaluate this gap, we introduce the first benchmark for social reasoning in video generation. Grounded in findings from developmental and social psychology, our benchmark organizes thirty classic social cognition paradigms into seven core dimensions, including mental-state inference, goal-directed action, joint attention, social coordination, prosocial behavior, social norms, and multi-agent strategy. To operationalize these paradigms, we develop a fully training-free agent-based pipeline that (i) distills the reasoning mechanism of each experiment, (ii) synthesizes diverse video-ready scenarios, (iii) enforces conceptual neutrality and difficulty control through cue-based critique, and (iv) evaluates generated videos using a high-capacity VLM judge across five interpretable dimensions of social reasoning. Using this framework, we conduct the first large-scale study across seven state-of-the-art video generation systems. Our results reveal substantial performance gaps: while modern models excel in surface-level plausibility, they systematically fail in intention recognition, belief reasoning, joint attention, and prosocial inference.
Related papers
- SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models [34.928133808112925]
We show that pre-trained vision-language models (VLMs) struggle to unify and learn multiple social perception tasks simultaneously.<n>We propose SocialFusion, a unified framework that learns a minimal connection between a frozen visual encoder and a language model.<n>Our findings suggest that current VLM pre-training strategies may be detrimental to acquiring general social competence.
arXiv Detail & Related papers (2025-11-30T23:54:54Z) - Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z) - SocialNLI: A Dialogue-Centric Social Inference Dataset [49.60157928163403]
We introduce SocialNLI -- the first social dialogue inference dataset.<n>SocialNLI consists of a collection of dialogue transcripts hand-picked to center complex social nuances.<n>We evaluate reasoning models theory-of-mind ability through multi-step counterfactual reasoning.
arXiv Detail & Related papers (2025-10-06T23:42:01Z) - Social World Models [35.672466808871945]
We introduce a novel structured social world representation formalism (S3AP)<n>S3AP represents social interactions as structureds, such as state, observation, agent actions, and mental states.<n>We show S3AP can help LLMs better understand social narratives across 5 social reasoning tasks.<n>We then induce social world models from these structured representations, demonstrating their ability to predict future social dynamics.
arXiv Detail & Related papers (2025-08-30T16:52:58Z) - Simulating Generative Social Agents via Theory-Informed Workflow Design [11.992123170134185]
We propose a theory-informed framework that provides a systematic design process for social agents.<n>Our framework is grounded in principles from Social Cognition Theory and introduces three key modules: motivation, action planning, and learning.<n>Experiments demonstrate that our theory-driven agents reproduce realistic human behavior patterns under complex conditions.
arXiv Detail & Related papers (2025-08-12T08:14:48Z) - SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models [41.68365456601248]
We introduce SocialMaze, a new benchmark specifically designed to evaluate social reasoning.<n>SocialMaze systematically incorporates three core challenges: deep reasoning, dynamic interaction, and information uncertainty.<n>It provides six diverse tasks across three key settings: social reasoning games, daily-life interactions, and digital community platforms.
arXiv Detail & Related papers (2025-05-29T17:47:36Z) - Social Genome: Grounded Social Reasoning Abilities of Multimodal Models [61.88413918026431]
Social reasoning abilities are crucial for AI systems to interpret and respond to multimodal human communication and interaction within social contexts.<n>We introduce SOCIAL GENOME, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models.
arXiv Detail & Related papers (2025-02-21T00:05:40Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box.
We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.