Learning Group Activity Features Through Person Attribute Prediction
- URL: http://arxiv.org/abs/2403.02753v2
- Date: Mon, 11 Mar 2024 05:15:48 GMT
- Title: Learning Group Activity Features Through Person Attribute Prediction
- Authors: Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita
- Abstract summary: Group Activity Feature (GAF) learning is proposed.
By learning the whole network in an end-to-end manner, the attributes of people in a group are trained.
- Score: 13.964739198311001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes Group Activity Feature (GAF) learning in which features
of multi-person activity are learned as a compact latent vector. Unlike prior
work in which the manual annotation of group activities is required for
supervised learning, our method learns the GAF through person attribute
prediction without group activity annotations. By learning the whole network in
an end-to-end manner so that the GAF is required for predicting the person
attributes of people in a group, the GAF is trained as the features of
multi-person activity. As a person attribute, we propose to use a person's
action class and appearance features because the former is easy to annotate due
to its simpleness, and the latter requires no manual annotation. In addition,
we introduce a location-guided attribute prediction to disentangle the complex
GAF for extracting the features of each target person properly. Various
experimental results validate that our method outperforms SOTA methods
quantitatively and qualitatively on two public datasets. Visualization of our
GAF also demonstrates that our method learns the GAF representing fined-grained
group activity classes. Code: https://github.com/chihina/GAFL-CVPR2024.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition [51.24321348668037]
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes.
Previous methods rely on manually annotated detection boxes in training and inference, hindering further practical deployment.
We propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes.
arXiv Detail & Related papers (2024-05-04T01:53:22Z) - Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics.
We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z) - Attribute-Aware Representation Rectification for Generalized Zero-Shot
Learning [19.65026043141699]
Generalized Zero-shot Learning (GZSL) has yielded remarkable performance by designing a series of unbiased visual-semantics mappings.
We propose a simple yet effective Attribute-Aware Representation Rectification framework for GZSL, dubbed $mathbf(AR)2$.
arXiv Detail & Related papers (2023-11-23T11:30:32Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - DECOMPL: Decompositional Learning with Attention Pooling for Group
Activity Recognition from a Single Volleyball Image [3.6144103736375857]
Group Activity Recognition (GAR) aims to detect the activity performed by multiple actors in a scene.
We propose a novel GAR technique for volleyball videos, DECOMPL, which consists of two complementary branches.
In the visual branch, it extracts the features using attention pooling in a selective way.
In the coordinate branch, it considers the current configuration of the actors and extracts spatial information from the box coordinates.
arXiv Detail & Related papers (2023-03-11T16:30:51Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Pose is all you need: The pose only group activity recognition system
(POGARS) [7.876115370275732]
We introduce a novel deep learning based group activity recognition approach called Pose Only Group Activity Recognition System (POGARS)
POGARS uses 1D CNNs to learn dynamics of individuals involved in group activity and forgo learning from pixel data.
Experimental results confirm that POGARS achieves highly competitive results compared to state-of-the-art methods on a widely used public volleyball dataset.
arXiv Detail & Related papers (2021-08-09T17:16:04Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Adaptive Prototypical Networks with Label Words and Joint Representation
Learning for Few-Shot Relation Classification [17.237331828747006]
This work focuses on few-shot relation classification (FSRC)
We propose an adaptive mixture mechanism to add label words to the representation of the class prototype.
Experiments have been conducted on FewRel under different few-shot (FS) settings.
arXiv Detail & Related papers (2021-01-10T11:25:42Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.