DeSIQ: Towards an Unbiased, Challenging Benchmark for Social
Intelligence Understanding
- URL: http://arxiv.org/abs/2310.18359v1
- Date: Tue, 24 Oct 2023 06:21:34 GMT
- Title: DeSIQ: Towards an Unbiased, Challenging Benchmark for Social
Intelligence Understanding
- Authors: Xiao-Yu Guo and Yuan-Fang Li and Gholamreza Haffari
- Abstract summary: We study the soundness of Social-IQ, a dataset of multiple-choice questions on videos of complex social interactions.
Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model.
We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ.
- Score: 60.84356161106069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social intelligence is essential for understanding and reasoning about human
expressions, intents and interactions. One representative benchmark for its
study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice
questions on videos of complex social interactions. We define a comprehensive
methodology to study the soundness of Social-IQ, as the soundness of such
benchmark datasets is crucial to the investigation of the underlying research
problem. Our analysis reveals that Social-IQ contains substantial biases, which
can be exploited by a moderately strong language model to learn spurious
correlations to achieve perfect performance without being given the context or
even the question. We introduce DeSIQ, a new challenging dataset, constructed
by applying simple perturbations to Social-IQ. Our empirical analysis shows
DeSIQ significantly reduces the biases in the original Social-IQ dataset.
Furthermore, we examine and shed light on the effect of model size, model
style, learning settings, commonsense knowledge, and multi-modality on the new
benchmark performance. Our new dataset, observations and findings open up
important research questions for the study of social intelligence.
Related papers
- Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems [62.252355444948904]
This paper presents the findings of a literature review on the integration of edge intelligence (EI) and socialized learning (SL)
SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents.
We elaborate on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses.
arXiv Detail & Related papers (2024-04-20T11:07:29Z) - Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions [67.60397632819202]
Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal.
We identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI.
arXiv Detail & Related papers (2024-04-17T02:57:42Z) - Academically intelligent LLMs are not necessarily socially intelligent [56.452845189961444]
The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear.
Inspired by established human social intelligence frameworks, we have developed a standardized social intelligence test based on real-world social scenarios.
arXiv Detail & Related papers (2024-03-11T10:35:53Z) - Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets.
Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects.
We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Semantic Categorization of Social Knowledge for Commonsense Question
Answering [13.343786884695323]
We propose to categorize the semantics needed for commonsense question answering tasks using the SocialIQA as an example.
Unlike previous work, we observe our models with semantic categorizations of social knowledge can achieve comparable performance with a relatively simple model.
arXiv Detail & Related papers (2021-09-11T02:56:14Z) - Characterizing Datasets for Social Visual Question Answering, and the
New TinySocial Dataset [0.7313653675718068]
Social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content.
Social visual question answering (social VQA) is emerging as a valuable methodology for studying social reasoning in both humans and AI agents.
We discuss methods for creating and characterizing social VQA datasets.
arXiv Detail & Related papers (2020-10-08T03:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.