Related papers: R^3-VQA: "Read the Room" by Video Social Reasoning

R^3-VQA: "Read the Room" by Video Social Reasoning

URL: http://arxiv.org/abs/2505.04147v1
Date: Wed, 07 May 2025 05:55:45 GMT
Title: R^3-VQA: "Read the Room" by Video Social Reasoning
Authors: Lixing Niu, Jiapeng Li, Xingping Yu, Shu Wang, Ruining Feng, Bo Wu, Ping Wei, Yisen Wang, Lifeng Fan,
Abstract summary: "Read the room" is a significant social reasoning capability in human daily life.<n>We contribute a valuable, high-quality, and comprehensive video dataset named R3-VQA.
Score: 26.694917467429207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: "Read the room" is a significant social reasoning capability in human daily life. Humans can infer others' mental states from subtle social cues. Previous social reasoning tasks and datasets lack complexity (e.g., simple scenes, basic interactions, incomplete mental state variables, single-step reasoning, etc.) and fall far short of the challenges present in real-life social interactions. In this paper, we contribute a valuable, high-quality, and comprehensive video dataset named R^3-VQA with precise and fine-grained annotations of social events and mental states (i.e., belief, intent, desire, and emotion) as well as corresponding social causal chains in complex social scenarios. Moreover, we include human-annotated and model-generated QAs. Our task R^3-VQA includes three aspects: Social Event Understanding, Mental State Estimation, and Social Causal Reasoning. As a benchmark, we comprehensively evaluate the social reasoning capabilities and consistencies of current state-of-the-art large vision-language models (LVLMs). Comprehensive experiments show that (i) LVLMs are still far from human-level consistent social reasoning in complex social scenarios; (ii) Theory of Mind (ToM) prompting can help LVLMs perform better on social reasoning tasks. We provide some of our dataset and codes in supplementary material and will release our full dataset and codes upon acceptance.

Related papers

SocialSim: Towards Socialized Simulation of Emotional Support Conversation [68.5026443005566]
We introduce SocialSim, a novel framework that simulates emotional support conversations.<n>SocialSim integrates key aspects of social interactions: social disclosure and social awareness.<n>We construct SSConv, a large-scale synthetic ESC corpus of which quality can even surpass crowdsourced ESC data.
arXiv Detail & Related papers (2025-06-20T05:24:40Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models [41.68365456601248]
We introduce SocialMaze, a new benchmark specifically designed to evaluate social reasoning.<n>SocialMaze systematically incorporates three core challenges: deep reasoning, dynamic interaction, and information uncertainty.<n>It provides six diverse tasks across three key settings: social reasoning games, daily-life interactions, and digital community platforms.
arXiv Detail & Related papers (2025-05-29T17:47:36Z)
EgoSocialArena: Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective [22.30892836263764]
Social intelligence is built upon three pillars: cognitive intelligence, situational intelligence, and behavioral intelligence.<n>EgoSocialArena aims to systematically evaluate the social intelligence of large language models from a first-person perspective.
arXiv Detail & Related papers (2024-10-08T16:55:51Z)
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition [59.57095498284501]
We propose a novel approach that recognizes textbfContextual textbfSocial textbfRelationships (textbfConSoR) from a social cognitive perspective. We construct social-aware descriptive language prompts with social relationships for each image. Impressively, ConSoR outperforms previous methods with a 12.2% gain on the People-in-Social-Context (PISC) dataset and a 9.8% increase on the People-in-Photo-Album (PIPA) benchmark.
arXiv Detail & Related papers (2024-06-12T16:02:28Z)
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding [60.84356161106069]
We study the soundness of Social-IQ, a dataset of multiple-choice questions on videos of complex social interactions. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ.
arXiv Detail & Related papers (2023-10-24T06:21:34Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box. We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents [23.719833581321033]
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. We argue that aiming towards human-level AI requires a broader set of key social skills. We present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents.
arXiv Detail & Related papers (2021-07-02T10:39:18Z)
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions. Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)
Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset [0.7313653675718068]
Social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content. Social visual question answering (social VQA) is emerging as a valuable methodology for studying social reasoning in both humans and AI agents. We discuss methods for creating and characterizing social VQA datasets.
arXiv Detail & Related papers (2020-10-08T03:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.