Related papers: Multimodal Quantitative Measures for Multiparty Behaviour Evaluation

Multimodal Quantitative Measures for Multiparty Behaviour Evaluation

URL: http://arxiv.org/abs/2508.10916v1
Date: Fri, 01 Aug 2025 13:46:12 GMT
Title: Multimodal Quantitative Measures for Multiparty Behaviour Evaluation
Authors: Ojas Shirekar, Wim Pouw, Chenxu Hao, Vrushank Phadnis, Thabo Beeler, Chirag Raman,
Abstract summary: We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data.<n>We validate metric sensitivity through three theory-driven perturbations.<n>Mixed-effects analyses reveal predictable, joint-independent shifts.
Score: 6.709251546882382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Digital humans are emerging as autonomous agents in multiparty interactions, yet existing evaluation metrics largely ignore contextual coordination dynamics. We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data, spanning three complementary dimensions: (1) synchrony via Cross-Recurrence Quantification Analysis, (2) temporal alignment via Multiscale Empirical Mode Decompositionbased Beat Consistency, and (3) structural similarity via Soft Dynamic Time Warping. We validate metric sensitivity through three theory-driven perturbations -- gesture kinematic dampening, uniform speech-gesture delays, and prosodic pitch-variance reduction-applied to $\approx 145$ 30-second thin slices of group interactions from the DnD dataset. Mixed-effects analyses reveal predictable, joint-independent shifts: dampening increases CRQA determinism and reduces beat consistency, delays weaken cross-participant coupling, and pitch flattening elevates F0 Soft-DTW costs. A complementary perception study ($N=27$) compares judgments of full-video and skeleton-only renderings to quantify representation effects. Our three measures deliver orthogonal insights into spatial structure, timing alignment, and behavioural variability. Thereby forming a robust toolkit for evaluating and refining socially intelligent agents. Code available on \href{https://github.com/tapri-lab/gig-interveners}{GitHub}.

Related papers

Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions [0.0]
Agent drift is the progressive degradation of agent behavior, decision quality, and inter-agent coherence over extended interaction sequences.<n>We introduce the Agent Stability Index (ASI), a novel composite metric for quantifying drift across twelve dimensions.<n>We show how unchecked agent drift can lead to substantial reductions in task completion accuracy and increased human intervention requirements.
arXiv Detail & Related papers (2026-01-07T18:37:26Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Robustifying 3D Perception via Least-Squares Graphs for Multi-Agent Object Tracking [43.11267507022928]
This paper proposes a novel mitigation framework on 3D LiDAR scene against adversarial noise.<n>We employ the least-squares graph tool to reduce the induced positional error of each detection's centroid.<n>An extensive evaluation study on the real-world V2V4Real dataset demonstrates that the proposed method significantly outperforms both single and multi-agent tracking frameworks.
arXiv Detail & Related papers (2025-07-07T08:41:08Z)
Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation [10.122882293302787]
temporal segmentation of human actions is critical for intelligent robots in collaborative settings.<n>We propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 30 fps) motion data.<n>Our approach outperforms state-of-the-art methods, especially in action segmentation accuracy.
arXiv Detail & Related papers (2025-07-01T13:55:57Z)
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification [8.516886985159928]
Dyadic social relationships are shaped by shared spatial and temporal experiences.<n>Current computational methods for modeling these relationships face three major challenges.<n>We propose AsyReC, a multimodal graph-based framework for asymmetric dyadic relationship classification.
arXiv Detail & Related papers (2025-04-07T12:52:23Z)
Neural Interaction Energy for Multi-Agent Trajectory Prediction [55.098754835213995]
We introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE) MATE assesses the interactive motion of agents by employing neural interaction energy. To bolster temporal stability, we introduce two constraints: inter-agent interaction constraint and intra-agent motion constraint.
arXiv Detail & Related papers (2024-04-25T12:47:47Z)
Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z)
Intensity Profile Projection: A Framework for Continuous-Time Representation Learning for Dynamic Networks [50.2033914945157]
We present a representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data. The framework consists of three stages: estimating pairwise intensity functions, learning a projection which minimises a notion of intensity reconstruction error. Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses.
arXiv Detail & Related papers (2023-06-09T15:38:25Z)
Averaging Spatio-temporal Signals using Optimal Transport and Soft Alignments [110.79706180350507]
We show that our proposed loss can be used to define temporal-temporal baryechecenters as Fr'teche means duality. Experiments on handwritten letters and brain imaging data confirm our theoretical findings.
arXiv Detail & Related papers (2022-03-11T09:46:22Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video. We show that the proposed model can reliably predict disparity and scene flow in challenging imagery. It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.