From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
- URL: http://arxiv.org/abs/2406.08358v1
- Date: Wed, 12 Jun 2024 16:02:28 GMT
- Title: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
- Authors: Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong Chen,
- Abstract summary: We propose a novel approach that recognizes textbfContextual textbfSocial textbfRelationships (textbfConSoR) from a social cognitive perspective.
We construct social-aware descriptive language prompts with social relationships for each image.
Impressively, ConSoR outperforms previous methods with a 12.2% gain on the People-in-Social-Context (PISC) dataset and a 9.8% increase on the People-in-Photo-Album (PIPA) benchmark.
- Score: 59.57095498284501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods of social relationship understanding rely on the basic classification paradigm of detected persons and objects, which fails to understand the comprehensive context and often overlooks decisive social factors, especially subtle visual cues. To highlight the social-aware context and intricate details, we propose a novel approach that recognizes \textbf{Con}textual \textbf{So}cial \textbf{R}elationships (\textbf{ConSoR}) from a social cognitive perspective. Specifically, to incorporate social-aware semantics, we build a lightweight adapter upon the frozen CLIP to learn social concepts via our novel multi-modal side adapter tuning mechanism. Further, we construct social-aware descriptive language prompts (e.g., scene, activity, objects, emotions) with social relationships for each image, and then compel ConSoR to concentrate more intensively on the decisive visual social factors via visual-linguistic contrasting. Impressively, ConSoR outperforms previous methods with a 12.2\% gain on the People-in-Social-Context (PISC) dataset and a 9.8\% increase on the People-in-Photo-Album (PIPA) benchmark. Furthermore, we observe that ConSoR excels at finding critical visual evidence to reveal social relationships.
Related papers
- Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - SocAoG: Incremental Graph Parsing for Social Relation Inference in
Dialogues [112.94918467195637]
Inferring social relations from dialogues is vital for building emotionally intelligent robots.
We model the social network as an And-or Graph, named SocAoG, for the consistency of relations among a group.
Empirical results on DialogRE and MovieGraph show that our model infers social relations more accurately than the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-02T08:07:42Z) - SRA-LSTM: Social Relationship Attention LSTM for Human Trajectory
Prediction [3.1703939581903864]
Social relationship among pedestrians is a key factor influencing pedestrian walking patterns.
Social Relationship Attention LSTM (SRA-LSTM) model to predict future trajectories.
arXiv Detail & Related papers (2021-03-31T12:56:39Z) - PHASE: PHysically-grounded Abstract Social Events for Machine Social
Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions.
Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events.
As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z) - Towards a Better Understanding of Social Acceptability [28.727916976371265]
Social contexts play an important role in understanding acceptance and use of technology.
Current approaches to describe contextual influence do not capture it appropriately.
We suggest an approach based on Social Practice Theory.
arXiv Detail & Related papers (2021-03-02T10:59:17Z) - "where is this relationship going?": Understanding Relationship
Trajectories in Narrative Text [28.14874371042193]
Given a narrative describing a social interaction, systems make inferences about the underlying relationship trajectory.
We construct a new dataset, Social Narrative Tree, which consists of 1250 stories documenting a variety of daily social interactions.
arXiv Detail & Related papers (2020-10-29T02:07:05Z) - Graph-Based Social Relation Reasoning [101.9402771161935]
We propose a graph relational reasoning network (GR2N) for social relation recognition.
Our method considers the paradigm of jointly inferring the relations by constructing a social relation graph.
Experimental results illustrate that our method generates a reasonable and consistent social relation graph.
arXiv Detail & Related papers (2020-07-15T03:01:11Z) - Recursive Social Behavior Graph for Trajectory Prediction [49.005219590582676]
We formulate social representations supervised by group-based annotations into a social behavior graph, called Recursive Social Behavior Graph.
With the guidance of Recursive Social Behavior Graph, we surpass state-of-the-art method on ETH and UCY dataset for 11.1% in ADE and 10.8% in FDE.
arXiv Detail & Related papers (2020-04-22T06:01:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.