Learning End-to-End Action Interaction by Paired-Embedding Data
Augmentation
- URL: http://arxiv.org/abs/2007.08071v1
- Date: Thu, 16 Jul 2020 01:54:16 GMT
- Title: Learning End-to-End Action Interaction by Paired-Embedding Data
Augmentation
- Authors: Ziyang Song, Zejian Yuan, Chong Zhang, Wanchao Chi, Yonggen Ling and
Shenghao Zhang
- Abstract summary: A new Interactive Action Translation (IAT) task aims to learn end-to-end action interaction from unlabeled interactive pairs.
We propose a Paired-Embedding (PE) method for effective and reliable data augmentation.
Experimental results on two datasets show impressive effects and broad application prospects of our method.
- Score: 10.857323240766428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recognition-based action interaction, robots' responses to human actions
are often pre-designed according to recognized categories and thus stiff. In
this paper, we specify a new Interactive Action Translation (IAT) task which
aims to learn end-to-end action interaction from unlabeled interactive pairs,
removing explicit action recognition. To enable learning on small-scale data,
we propose a Paired-Embedding (PE) method for effective and reliable data
augmentation. Specifically, our method first utilizes paired relationships to
cluster individual actions in an embedding space. Then two actions originally
paired can be replaced with other actions in their respective neighborhood,
assembling into new pairs. An Act2Act network based on conditional GAN follows
to learn from augmented data. Besides, IAT-test and IAT-train scores are
specifically proposed for evaluating methods on our task. Experimental results
on two datasets show impressive effects and broad application prospects of our
method.
Related papers
- PEAR: Phrase-Based Hand-Object Interaction Anticipation [20.53329698350243]
First-person hand-object interaction anticipation aims to predict the interaction process based on current scenes and prompts.
Existing research typically anticipates only interaction intention while neglecting manipulation.
We propose a novel model, PEAR, which jointly anticipates interaction intention and manipulation.
arXiv Detail & Related papers (2024-07-31T10:28:49Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks [4.971065912401385]
We propose Dual-VCLIP, a unified approach for zero-shot multi-label action recognition.
Dual-VCLIP enhances VCLIP, a zero-shot action recognition method, with the DualCoOp method for multi-label image classification.
We validate our method on the Charades dataset that includes a majority of object-based actions.
arXiv Detail & Related papers (2024-05-14T15:28:48Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Interactive segmentation in aerial images: a new benchmark and an open
access web-based tool [2.729446374377189]
In recent years, interactive semantic segmentation proposed in computer vision has achieved an ideal state of human-computer interaction segmentation.
This study aims to bridge the gap between interactive segmentation and remote sensing analysis by conducting benchmark study on various interactive segmentation models.
arXiv Detail & Related papers (2023-08-25T04:49:49Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Transferable Interactiveness Knowledge for Human-Object Interaction
Detection [46.89715038756862]
We explore interactiveness knowledge which indicates whether a human and an object interact with each other or not.
We found that interactiveness knowledge can be learned across HOI datasets and bridge the gap between diverse HOI category settings.
Our core idea is to exploit an interactiveness network to learn the general interactiveness knowledge from multiple HOI datasets.
arXiv Detail & Related papers (2021-01-25T18:21:07Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z) - Asynchronous Interaction Aggregation for Action Detection [43.34864954534389]
We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance.
arXiv Detail & Related papers (2020-04-16T07:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.