Related papers: Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection

Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection

URL: http://arxiv.org/abs/2511.03666v1
Date: Wed, 05 Nov 2025 17:33:03 GMT
Title: Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection
Authors: Dongkeun Kim, Minsu Cho, Suha Kwak,
Abstract summary: We propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection.<n>The proposed method infers social groups and their interactions using body part features and their interpersonal relations.<n>Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning.
Score: 82.70752567211251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social interactions often emerge from subtle, fine-grained cues such as facial expressions, gaze, and gestures. However, existing methods for social interaction detection overlook such nuanced cues and primarily rely on holistic representations of individuals. Moreover, they directly detect social groups without explicitly modeling the underlying interactions between individuals. These drawbacks limit their ability to capture localized social signals and introduce ambiguity when group configurations should be inferred from social interactions grounded in nuanced cues. In this work, we propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection. The proposed method infers social groups and their interactions using body part features and their interpersonal relations. Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning, which considers not only spatial relations but also subtle social cues that signal interactions, leading to more accurate group inference. Experiments on the NVI dataset demonstrate that our method outperforms prior methods, achieving the new state of the art.

Related papers

Learning Human-Object Interaction as Groups [52.28258599873394]
GroupHOI is a framework that propagates contextual information in terms of geometric proximity and semantic similarity.<n>It exhibits leading performance on the more challenging Nonverbal Interaction Detection task.
arXiv Detail & Related papers (2025-10-21T07:25:10Z)
MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes [49.89767522399176]
Group-level social interactions in public spaces are crucial for urban planning.<n>We introduce a social group region detection task, which requires inferring and spatially grounding visual regions defined by interpersonal relations.<n>We propose MINGLE, a modular three-stage pipeline that integrates human detection and depth estimation, VLM-based reasoning to classify pairwise social affiliation, and a lightweight spatial aggregation algorithm to localize socially connected groups.<n>We present a new dataset of 100K urban street-view images annotated with bounding boxes and labels for both individuals and socially interacting groups.
arXiv Detail & Related papers (2025-09-16T19:31:40Z)
Diffusion-Based Imitation Learning for Social Pose Generation [0.0]
Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans.<n>We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to generate nonverbal social cues for the facilitator of that interaction.
arXiv Detail & Related papers (2025-01-18T20:31:55Z)
Social Processes: Probabilistic Meta-learning for Adaptive Multiparty Interaction Forecasting [3.9134031118910264]
We introduce Social Process (SP) models, which predict a distribution over future multimodal cues jointly for all group members.<n>We also analyze the generalization capabilities of SP models in both their outputs and latent spaces through the use of realistic synthetic datasets.
arXiv Detail & Related papers (2025-01-03T17:34:53Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Self-supervised Social Relation Representation for Human Group Detection [18.38523753680367]
We propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a human behavior simulator head to learn the social relation feature embedding. In the second stage, based on the social relation embedding, we develop a self-attention inspired network for human group detection.
arXiv Detail & Related papers (2022-03-08T04:26:07Z)
SSAGCN: Social Soft Attention Graph Convolution Network for Pedestrian Trajectory Prediction [59.064925464991056]
We propose one new prediction model named Social Soft Attention Graph Convolution Network (SSAGCN) SSAGCN aims to simultaneously handle social interactions among pedestrians and scene interactions between pedestrians and environments. Experiments on public available datasets prove the effectiveness of SSAGCN and have achieved state-of-the-art results.
arXiv Detail & Related papers (2021-12-05T01:49:18Z)
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions. Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)
Connections between Relational Event Model and Inverse Reinforcement Learning for Characterizing Group Interaction Sequences [0.18275108630751835]
We explore previously unidentified connections between relational event model (REM) and inverse reinforcement learning (IRL) REM is a conventional approach to tackle such a problem whereas the application of IRL is a largely unbeaten path. We demonstrate the special utility of IRL in characterizing group social interactions with an empirical experiment.
arXiv Detail & Related papers (2020-10-19T19:40:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.