S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test
- URL: http://arxiv.org/abs/2512.19992v1
- Date: Tue, 23 Dec 2025 02:36:56 GMT
- Title: S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test
- Authors: Zhe Sun, Xueyuan Yang, Yujie Lu, Zhenliang Zhang,
- Abstract summary: We introduce the Spatially Situated Social Intelligence Test (S$3$IT), a benchmark specifically designed to evaluate embodied social intelligence.<n>It is centered on a novel and challenging seat-ordering task, requiring an agent to arrange seating in a 3D environment for a group of large language model-driven NPCs.<n>Our framework generates a vast and diverse scenario space with controllable difficulty, compelling the agent to acquire preferences through active dialogue, perceive the environment via autonomous exploration, and perform multi-objective optimization within a complex constraint network.
- Score: 26.79990069295221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints. However, existing evaluations fail to address this integration, as they are limited to either disembodied social reasoning (e.g., in text) or socially-agnostic physical tasks. Both approaches fail to assess an agent's ability to integrate and trade off both physical and social constraints within a realistic, embodied context. To address this challenge, we introduce Spatially Situated Social Intelligence Test (S$^{3}$IT), a benchmark specifically designed to evaluate embodied social intelligence. It is centered on a novel and challenging seat-ordering task, requiring an agent to arrange seating in a 3D environment for a group of large language model-driven (LLM-driven) NPCs with diverse identities, preferences, and intricate interpersonal relationships. Our procedurally extensible framework generates a vast and diverse scenario space with controllable difficulty, compelling the agent to acquire preferences through active dialogue, perceive the environment via autonomous exploration, and perform multi-objective optimization within a complex constraint network. We evaluate state-of-the-art LLMs on S$^{3}$IT and found that they still struggle with this problem, showing an obvious gap compared with the human baseline. Results imply that LLMs have deficiencies in spatial intelligence, yet simultaneously demonstrate their ability to achieve near human-level competence in resolving conflicts that possess explicit textual cues.
Related papers
- Interpretable Debiasing of Vision-Language Models for Social Fairness [55.85977929985967]
We introduce an interpretable, model-agnostic bias mitigation framework, DeBiasLens, that localizes social attribute neurons in Vision-Language models.<n>We train SAEs on facial image or caption datasets without corresponding social attribute labels to uncover neurons highly responsive to specific demographics.<n>Our research lays the groundwork for future auditing tools, prioritizing social fairness in emerging real-world AI systems.
arXiv Detail & Related papers (2026-02-27T13:37:11Z) - From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection [57.74400052368147]
This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning.<n>The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths.<n>Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions.
arXiv Detail & Related papers (2026-02-09T18:46:12Z) - SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers [30.204481123869186]
textscSocialVeil is a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers.<n>We show that barriers consistently impair performance, with mutual understanding reduced by over 45% on average, and confusion elevated by nearly 50%.
arXiv Detail & Related papers (2026-02-04T23:04:25Z) - How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective [103.44502230776352]
We present a systematic investigation of Visual Spatial Reasoning (VSR) in Vision-Language Models (VLMs)<n>We categorize spatial intelligence into three levels of capability, ie, basic perception, spatial understanding, spatial planning, and curate SIBench, a spatial intelligence benchmark encompassing nearly 20 open-source datasets across 23 task settings.
arXiv Detail & Related papers (2025-09-23T12:00:14Z) - The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process [13.959658276224266]
AI alignment with human values and preferences remains a core challenge.<n>As agents engage with one another, they must coordinate to accomplish both individual and collective goals.<n>Social structure can deter or shatter group and individual values.<n>We call on the AI community to treat human, preferential, and objective alignment as an interdependent concept.
arXiv Detail & Related papers (2025-06-01T16:39:43Z) - SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z) - Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge [47.74313897705183]
CHAIC is an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents.<n>In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints.
arXiv Detail & Related papers (2024-11-04T04:41:12Z) - AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios [38.878966229688054]
We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios.
Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts.
We analyze goals using ERG theory and conduct comprehensive experiments.
Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
arXiv Detail & Related papers (2024-10-25T07:04:16Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.