Joint Engagement Classification using Video Augmentation Techniques for
Multi-person Human-robot Interaction
- URL: http://arxiv.org/abs/2212.14128v1
- Date: Wed, 28 Dec 2022 23:52:55 GMT
- Title: Joint Engagement Classification using Video Augmentation Techniques for
Multi-person Human-robot Interaction
- Authors: Yubin Kim, Huili Chen, Sharifa Alghowinem, Cynthia Breazeal, and Hae
Won Park
- Abstract summary: We present a novel framework for identifying a parent-child dyad's joint engagement.
Using a dataset of parent-child dyads reading storybooks together with a social robot at home, we first train RGB frame- and skeleton-based joint engagement recognition models.
Second, we demonstrate experimental results on the use of trained models in the robot-parent-child interaction context.
- Score: 22.73774398716566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Affect understanding capability is essential for social robots to
autonomously interact with a group of users in an intuitive and reciprocal way.
However, the challenge of multi-person affect understanding comes from not only
the accurate perception of each user's affective state (e.g., engagement) but
also the recognition of the affect interplay between the members (e.g., joint
engagement) that presents as complex, but subtle, nonverbal exchanges between
them. Here we present a novel hybrid framework for identifying a parent-child
dyad's joint engagement by combining a deep learning framework with various
video augmentation techniques. Using a dataset of parent-child dyads reading
storybooks together with a social robot at home, we first train RGB frame- and
skeleton-based joint engagement recognition models with four video augmentation
techniques (General Aug, DeepFake, CutOut, and Mixed) applied datasets to
improve joint engagement classification performance. Second, we demonstrate
experimental results on the use of trained models in the robot-parent-child
interaction context. Third, we introduce a behavior-based metric for evaluating
the learned representation of the models to investigate the model
interpretability when recognizing joint engagement. This work serves as the
first step toward fully unlocking the potential of end-to-end video
understanding models pre-trained on large public datasets and augmented with
data augmentation and visualization techniques for affect recognition in the
multi-person human-robot interaction in the wild.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching [19.193379036629167]
This paper introduces a representative-based approach for distributed learning that transforms multiple raw data points into a virtual representation.
It achieves this by condensing extensive datasets into digestible formats, thus fostering intuitive human-machine interactions.
arXiv Detail & Related papers (2024-05-06T18:21:41Z) - Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition [22.538114033191313]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Human-to-Human Interaction Detection [3.00604614803979]
We introduce a new task named human-to-human interaction detection (HID)
HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model.
First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I)
arXiv Detail & Related papers (2023-07-02T03:24:58Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.