Action Class Relation Detection and Classification Across Multiple Video
Datasets
- URL: http://arxiv.org/abs/2308.07558v1
- Date: Tue, 15 Aug 2023 03:56:46 GMT
- Title: Action Class Relation Detection and Classification Across Multiple Video
Datasets
- Authors: Yuya Yoshikawa, Yutaro Shigeto, Masashi Shimbo, Akikazu Takeuchi
- Abstract summary: We consider two new machine learning tasks: action class relation detection and classification.
We propose a unified model to predict relations between action classes, using language and visual information associated with classes.
Experimental results show that (i) pre-trained recent neural network models for texts and videos contribute to high predictive performance, (ii) the relation prediction based on action label texts is more accurate than based on videos, and (iii) a blending approach can further improve the predictive performance in some cases.
- Score: 1.15520000056402
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Meta Video Dataset (MetaVD) provides annotated relations between action
classes in major datasets for human action recognition in videos. Although
these annotated relations enable dataset augmentation, it is only applicable to
those covered by MetaVD. For an external dataset to enjoy the same benefit, the
relations between its action classes and those in MetaVD need to be determined.
To address this issue, we consider two new machine learning tasks: action class
relation detection and classification. We propose a unified model to predict
relations between action classes, using language and visual information
associated with classes. Experimental results show that (i) pre-trained recent
neural network models for texts and videos contribute to high predictive
performance, (ii) the relation prediction based on action label texts is more
accurate than based on videos, and (iii) a blending approach that combines
predictions by both modalities can further improve the predictive performance
in some cases.
Related papers
- Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Multiple Relations Classification using Imbalanced Predictions
Adaptation [0.0]
The relation classification task assigns the proper semantic relation to a pair of subject and object entities.
Current relation classification models employ additional procedures to identify multiple relations in a single sentence.
We propose a multiple relations classification model that tackles these issues through a customized output architecture and by exploiting additional input features.
arXiv Detail & Related papers (2023-09-24T18:36:22Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Tensor Composition Net for Visual Relationship Prediction [115.14829858763399]
We present a novel Composition Network (TCN) to predict visual relationships in images.
The key idea of our TCN is to exploit the low rank property of the visual relationship tensor.
We show our TCN's image-level visual relationship prediction provides a simple and efficient mechanism for relation-based image retrieval.
arXiv Detail & Related papers (2020-12-10T06:27:20Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Learning End-to-End Action Interaction by Paired-Embedding Data
Augmentation [10.857323240766428]
A new Interactive Action Translation (IAT) task aims to learn end-to-end action interaction from unlabeled interactive pairs.
We propose a Paired-Embedding (PE) method for effective and reliable data augmentation.
Experimental results on two datasets show impressive effects and broad application prospects of our method.
arXiv Detail & Related papers (2020-07-16T01:54:16Z) - Actor-Context-Actor Relation Network for Spatio-Temporal Action
Localization [47.61419011906561]
ACAR-Net builds upon a novel High-order Relation Reasoning Operator to enable indirect reasoning fortemporal action localization.
Our method ranks first in the AVA-Kineticsaction localization task of ActivityNet Challenge 2020.
arXiv Detail & Related papers (2020-06-14T18:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.