Related papers: Learning Group Activities from Skeletons without Individual Action Labels

Learning Group Activities from Skeletons without Individual Action Labels

URL: http://arxiv.org/abs/2105.06754v1
Date: Fri, 14 May 2021 10:31:32 GMT
Title: Learning Group Activities from Skeletons without Individual Action Labels
Authors: Fabio Zappardino and Tiberio Uricchio and Lorenzo Seidenari and Alberto Del Bimbo
Abstract summary: We show that using only skeletal data we can train a state-of-the art end-to-end system using only group activity labels at the sequence level. Our experiments show that models trained without individual action supervision perform poorly. Our carefully designed lean pose only architecture shows highly competitive results versus more complex multimodal approaches even in the self-supervised variant.
Score: 32.60526967706986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To understand human behavior we must not just recognize individual actions but model possibly complex group activity and interactions. Hierarchical models obtain the best results in group activity recognition but require fine grained individual action annotations at the actor level. In this paper we show that using only skeletal data we can train a state-of-the art end-to-end system using only group activity labels at the sequence level. Our experiments show that models trained without individual action supervision perform poorly. On the other hand we show that pseudo-labels can be computed from any pre-trained feature extractor with comparable final performance. Finally our carefully designed lean pose only architecture shows highly competitive results versus more complex multimodal approaches even in the self-supervised variant.

Related papers

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation [82.45303116125021]
This work presents Self-supervised Open-world Hierarchical Entities (SOHES), a novel approach that eliminates the need for human annotations. We produce abundant high-quality pseudo-labels through visual feature clustering, and rectify the noises in pseudo-labels via a teacher- mutual-learning procedure. Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation.
arXiv Detail & Related papers (2024-04-18T17:59:46Z)
Group Activity Recognition using Unreliable Tracked Pose [8.592249538742527]
Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video. We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS)
arXiv Detail & Related papers (2024-01-06T17:36:13Z)
Fast and Expressive Gesture Recognition using a Combination-Homomorphic Electromyogram Encoder [21.25126610043744]
We study the task of gesture recognition from electromyography (EMG) We define combination gestures consisting of a direction component and a modifier component. New subjects only demonstrate the single component gestures. We extrapolate to unseen combination gestures by combining the feature vectors of real single gestures to produce synthetic training data.
arXiv Detail & Related papers (2023-10-30T20:03:34Z)
Semi-supervised learning made simple with self-supervised clustering [65.98152950607707]
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. We propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods into semi-supervised learners.
arXiv Detail & Related papers (2023-06-13T01:09:18Z)
Detector-Free Weakly Supervised Group Activity Recognition [41.344689949264335]
Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a video. We propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detectors. Our model based on Transformer localizes and encodes partial contexts of a group activity by leveraging the attention mechanism.
arXiv Detail & Related papers (2022-04-05T12:05:04Z)
Unsupervised Action Segmentation with Self-supervised Feature Learning and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label. In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos. We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z)
A Closer Look at Self-training for Zero-Label Semantic Segmentation [53.4488444382874]
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning. Prior zero-label semantic segmentation works approach this task by learning visual-semantic embeddings or generative models. We propose a consistency regularizer to filter out noisy pseudo-labels by taking the intersections of the pseudo-labels generated from different augmentations of the same image.
arXiv Detail & Related papers (2021-04-21T14:34:33Z)
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually. We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z)
Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation. We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data. Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm. We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data. Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.