Learning Triadic Belief Dynamics in Nonverbal Communication from Videos
- URL: http://arxiv.org/abs/2104.02841v1
- Date: Wed, 7 Apr 2021 00:52:04 GMT
- Title: Learning Triadic Belief Dynamics in Nonverbal Communication from Videos
- Authors: Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin
Zhu
- Abstract summary: Nonverbal communication can convey rich social information among agents.
In this paper, we incorporate different nonverbal communication cues to represent, model, learn, and infer agents' mental states.
- Score: 81.42305032083716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans possess a unique social cognition capability; nonverbal communication
can convey rich social information among agents. In contrast, such crucial
social characteristics are mostly missing in the existing scene understanding
literature. In this paper, we incorporate different nonverbal communication
cues (e.g., gaze, human poses, and gestures) to represent, model, learn, and
infer agents' mental states from pure visual inputs. Crucially, such a mental
representation takes the agent's belief into account so that it represents what
the true world state is and infers the beliefs in each agent's mental state,
which may differ from the true world states. By aggregating different beliefs
and true world states, our model essentially forms "five minds" during the
interactions between two agents. This "five minds" model differs from prior
works that infer beliefs in an infinite recursion; instead, agents' beliefs are
converged into a "common mind". Based on this representation, we further devise
a hierarchical energy-based model that jointly tracks and predicts all five
minds. From this new perspective, a social event is interpreted by a series of
nonverbal communication and belief dynamics, which transcends the classic
keyframe video summary. In the experiments, we demonstrate that using such a
social account provides a better video summary on videos with rich social
interactions compared with state-of-the-art keyframe video summary methods.
Related papers
- Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities [0.0]
We study the emergence of agency from scratch by using Large Language Model (LLM)-based agents.
By analyzing this multi-agent simulation, we report valuable new insights into how social norms, cooperation, and personality traits can emerge spontaneously.
arXiv Detail & Related papers (2024-11-05T16:49:33Z) - MuMA-ToM: Multi-modal Multi-Agent Theory of Mind [10.079620078670589]
We introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark.
We provide video and text descriptions of people's multi-modal behavior in realistic household environments.
We then ask questions about people's goals, beliefs, and beliefs about others' goals.
arXiv Detail & Related papers (2024-08-22T17:41:45Z) - Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind [0.35154948148425685]
Theory of Mind (ToM) is the ability to attribute beliefs, intentions, or mental states to others.
We show a developmental synergy between learning to predict low-level mental states and attributing high-level ones.
We propose that our computational approach can inform the understanding of human social cognitive development.
arXiv Detail & Related papers (2024-07-25T13:15:25Z) - Nonverbal Interaction Detection [83.40522919429337]
This work addresses a new challenge of understanding human nonverbal interaction in social contexts.
We contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups.
Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form individual, group, interaction> from images.
Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs.
arXiv Detail & Related papers (2024-07-11T02:14:06Z) - From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition [59.57095498284501]
We propose a novel approach that recognizes textbfContextual textbfSocial textbfRelationships (textbfConSoR) from a social cognitive perspective.
We construct social-aware descriptive language prompts with social relationships for each image.
Impressively, ConSoR outperforms previous methods with a 12.2% gain on the People-in-Social-Context (PISC) dataset and a 9.8% increase on the People-in-Photo-Album (PIPA) benchmark.
arXiv Detail & Related papers (2024-06-12T16:02:28Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - Digital Life Project: Autonomous 3D Characters with Social Intelligence [86.2845109451914]
Digital Life Project is a framework utilizing language as the universal medium to build autonomous 3D characters.
Our framework comprises two primary components: SocioMind and MoMat-MoGen.
arXiv Detail & Related papers (2023-12-07T18:58:59Z) - Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box.
We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z) - How social feedback processing in the brain shapes collective opinion
processes in the era of social media [0.0]
Drawing on recent neuro-scientific insights into the processing of social feedback, we develop a theoretical model that allows to address these questions.
Even strong majorities can be forced into silence if a minority acts as a cohesive whole.
The proposed framework of social feedback theory highlights the need for sociological theorising to understand the societal-level implications of findings in social and cognitive neuroscience.
arXiv Detail & Related papers (2020-03-18T11:06:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.