Human Semantic Representations of Social Interactions from Moving Shapes
- URL: http://arxiv.org/abs/2509.20673v1
- Date: Thu, 25 Sep 2025 02:15:50 GMT
- Title: Human Semantic Representations of Social Interactions from Moving Shapes
- Authors: Yiling Yun, Hongjing Lu,
- Abstract summary: We examine what semantic representations humans employ to complement visual features.<n>We measured the representational geometry of 27 social interactions through human similarity judgments.<n>Among the semantic models, verb-based embeddings extracted from descriptions account for human similarity judgments the best.
- Score: 0.3007949058551534
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans are social creatures who readily recognize various social interactions from simple display of moving shapes. While previous research has often focused on visual features, we examine what semantic representations that humans employ to complement visual features. In Study 1, we directly asked human participants to label the animations based on their impression of moving shapes. We found that human responses were distributed. In Study 2, we measured the representational geometry of 27 social interactions through human similarity judgments and compared it with model predictions based on visual features, labels, and semantic embeddings from animation descriptions. We found that semantic models provided complementary information to visual features in explaining human judgments. Among the semantic models, verb-based embeddings extracted from descriptions account for human similarity judgments the best. These results suggest that social perception in simple displays reflects the semantic structure of social interactions, bridging visual and abstract representations.
Related papers
- Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection [82.70752567211251]
We propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection.<n>The proposed method infers social groups and their interactions using body part features and their interpersonal relations.<n>Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning.
arXiv Detail & Related papers (2025-11-05T17:33:03Z) - Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots [5.8503433899583905]
We introduce Social 3D Scene Graphs, an augmented 3D Scene Graph representation that captures humans, their attributes, activities and relationships in the environment, both local and remote.<n>Our representation improves human activity prediction and reasoning about human-environment relations, paving the way toward socially intelligent robots.
arXiv Detail & Related papers (2025-09-29T16:00:40Z) - Human-like conceptual representations emerge from language prediction [72.5875173689788]
Large language models (LLMs) trained exclusively through next-token prediction over language data exhibit remarkably human-like behaviors.<n>Are these models developing concepts akin to humans, and if so, how are such concepts represented and organized?<n>Our results demonstrate that LLMs can flexibly derive concepts from linguistic descriptions in relation to contextual cues about other concepts.<n>These findings establish that structured, human-like conceptual representations can naturally emerge from language prediction without real-world grounding.
arXiv Detail & Related papers (2025-01-21T23:54:17Z) - Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase [0.0]
We evaluate the similarities and differences between semantic representations derived from text, behavior, and brain data.<n>We establish behavior as an important complement to text for capturing human representations and behavior.
arXiv Detail & Related papers (2024-12-06T10:44:20Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - How Do You Perceive My Face? Recognizing Facial Expressions in Multi-Modal Context by Modeling Mental Representations [5.895694050664867]
We introduce a novel approach for facial expression classification that goes beyond simple classification tasks.
Our model accurately classifies a perceived face and synthesizes the corresponding mental representation perceived by a human when observing a face in context.
We evaluate synthesized expressions in a human study, showing that our model effectively produces approximations of human mental representations.
arXiv Detail & Related papers (2024-09-04T09:32:40Z) - Identifying and interpreting non-aligned human conceptual
representations using language modeling [0.0]
We show that congenital blindness induces conceptual reorganization in both a-modal and sensory-related verbal domains.
We find that blind individuals more strongly associate social and cognitive meanings to verbs related to motion.
For some verbs, representations of blind and sighted are highly similar.
arXiv Detail & Related papers (2024-03-10T13:02:27Z) - A natural language processing-based approach: mapping human perception
by understanding deep semantic features in street view images [2.5880672192855414]
We propose a new framework based on a pre-train natural language model to understand the relationship between human perception and a scene.
Our results show that human perception scoring by deep semantic features performed better than previous studies by machine learning methods with shallow features.
arXiv Detail & Related papers (2023-11-29T05:00:43Z) - Towards Explaining Subjective Ground of Individuals on Social Media [28.491401997248527]
This research proposes a neural model that learns subjective grounds of individuals and accounts for their judgments on situations of others posted on social media.
Using simple attention modules as well as taking one's previous activities into consideration, we empirically show that our model provides human-readable explanations of an individual's subjective preference in judging social situations.
arXiv Detail & Related papers (2022-11-18T00:29:05Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - Learning Triadic Belief Dynamics in Nonverbal Communication from Videos [81.42305032083716]
Nonverbal communication can convey rich social information among agents.
In this paper, we incorporate different nonverbal communication cues to represent, model, learn, and infer agents' mental states.
arXiv Detail & Related papers (2021-04-07T00:52:04Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.