Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
- URL: http://arxiv.org/abs/2412.16698v1
- Date: Sat, 21 Dec 2024 16:54:28 GMT
- Title: Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
- Authors: Tongfei Bian, Yiming Ma, Mathieu Chollet, Victor Sanchez, Tanaya Guha,
- Abstract summary: SocialEgoNet is a graph-based framework that exploits task dependencies through a hierarchical learning approach.
SocialEgoNet uses body skeletons (keypoints from face, hands and body) extracted from only 1 second of video input for high inference speed.
For evaluation, we augment an existing egocentric human-agent interaction with new class labels and bounding box annotations.
- Score: 25.464036307823974
- License:
- Abstract: For efficient human-agent interaction, an agent should proactively recognize their target user and prepare for upcoming interactions. We formulate this challenging problem as the novel task of jointly forecasting a person's intent to interact with the agent, their attitude towards the agent and the action they will perform, from the agent's (egocentric) perspective. So we propose \emph{SocialEgoNet} - a graph-based spatiotemporal framework that exploits task dependencies through a hierarchical multitask learning approach. SocialEgoNet uses whole-body skeletons (keypoints from face, hands and body) extracted from only 1 second of video input for high inference speed. For evaluation, we augment an existing egocentric human-agent interaction dataset with new class labels and bounding box annotations. Extensive experiments on this augmented dataset, named JPL-Social, demonstrate \emph{real-time} inference and superior performance (average accuracy across all tasks: 83.15\%) of our model outperforming several competitive baselines. The additional annotations and code will be available upon acceptance.
Related papers
- Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input [54.81155589931697]
We propose a new task, Collaborative Instance Navigation (CoIN), with dynamic agent-human interaction during navigation.
To address CoIN, we propose a novel method, Agent-user Interaction with UncerTainty Awareness (AIUTA)
AIUTA achieves competitive performance in instance navigation against state-of-the-art methods, demonstrating great flexibility in handling user inputs.
arXiv Detail & Related papers (2024-12-02T08:16:38Z) - PMM-Net: Single-stage Multi-agent Trajectory Prediction with Patching-based Embedding and Explicit Modal Modulation [6.793915571620126]
In this letter, we aim to explore a distinct formulation for multi-agent trajectory prediction framework.
We propose a patching-based temporal feature extraction module and a graph-based social feature extraction module.
We present a novel method based on explicit modality modulation to integrate temporal and social features, thereby constructing an efficient single-stage inference pipeline.
arXiv Detail & Related papers (2024-10-25T13:16:27Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation [14.734158936250918]
Short-Term object-interaction Anticipation is fundamental for wearable assistants or human robot interaction to understand user goals.
We improve the performance of STA predictions with two contributions.
First, we propose STAformer, a novel attention-based architecture integrating frame guided temporal pooling, dual image-video attention, and multiscale feature fusion.
Second, we predict interaction hotspots from the observation of hands and object trajectories, increasing confidence in STA predictions localized around the hotspot.
arXiv Detail & Related papers (2024-06-03T10:57:18Z) - Best Practices for 2-Body Pose Forecasting [58.661899246497896]
We review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best.
Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding.
Our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset.
arXiv Detail & Related papers (2023-04-12T10:46:23Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.