MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
- URL: http://arxiv.org/abs/2408.00420v1
- Date: Thu, 1 Aug 2024 09:42:44 GMT
- Title: MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
- Authors: Wenqing Gan, Yan Sun, Feiran Liu, Xiangfeng Luo,
- Abstract summary: We propose a model called MPT-PAR that considers both the unique characteristics of each task and the synergies between different tasks simultaneously.
Our method achieved granularity and an overall F1 score of 47.5% on the JRDB-PAR dataset.
- Score: 2.1794550051087995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of the panoramic activity recognition task is to identify behaviors at various granularities within crowded and complex environments, encompassing individual actions, social group activities, and global activities. Existing methods generally use either parameter-independent modules to capture task-specific features or parameter-sharing modules to obtain common features across all tasks. However, there is often a strong interrelatedness and complementary effect between tasks of different granularities that previous methods have yet to notice. In this paper, we propose a model called MPT-PAR that considers both the unique characteristics of each task and the synergies between different tasks simultaneously, thereby maximizing the utilization of features across multi-granularity activity recognition. Furthermore, we emphasize the significance of temporal and spatial information by introducing a spatio-temporal relation-enhanced module and a scene representation learning module, which integrate the the spatio-temporal context of action and global scene into the feature map of each granularity. Our method achieved an overall F1 score of 47.5\% on the JRDB-PAR dataset, significantly outperforming all the state-of-the-art methods.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition [51.24321348668037]
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes.
Previous methods rely on manually annotated detection boxes in training and inference, hindering further practical deployment.
We propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes.
arXiv Detail & Related papers (2024-05-04T01:53:22Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - DynaShare: Task and Instance Conditioned Parameter Sharing for
Multi-Task Learning [11.955637263520492]
We present a novel parameter sharing method for multi-task learning.
We propose to dynamically decide which parts of the network to activate based on both the task and the input instance.
Our approach learns a hierarchical gating policy consisting of a task-specific policy for coarse layer selection and gating units for individual input instances.
arXiv Detail & Related papers (2023-05-26T23:43:21Z) - Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets.
We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z) - Interactive Fusion of Multi-level Features for Compositional Activity
Recognition [100.75045558068874]
We present a novel framework that accomplishes this goal by interactive fusion.
We implement the framework in three steps, namely, positional-to-appearance feature extraction, semantic feature interaction, and semantic-to-positional prediction.
We evaluate our approach on two action recognition datasets, Something-Something and Charades.
arXiv Detail & Related papers (2020-12-10T14:17:18Z) - MOPT: Multi-Object Panoptic Tracking [33.77171216778909]
We introduce a novel perception task denoted as multi-object panoptic tracking (MOPT)
MOPT allows for exploiting pixel-level semantic information of 'thing' and'stuff' classes, temporal coherence, and pixel-level associations over time.
We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results.
arXiv Detail & Related papers (2020-04-17T11:45:28Z) - Combined Model for Partially-Observable and Non-Observable Task
Switching: Solving Hierarchical Reinforcement Learning Problems Statically
and Dynamically with Transfer Learning [2.741266294612776]
Humans and animals rely on the interactions between the Pre-Frontal Cortex (PFC) and the Basal Ganglia (BG) to achieve this focus called Working Memory (WM)
Recent adaptations of the toolkit either utilize Abstract Task Representations (ATRs) to solve Non-Observable (NO) tasks or storage of past input features to solve Partially-Observable (PO) tasks, but not both.
We propose a new model, PONOWMtk, which combines both approaches, ATRs and input storage, with a static or dynamic number of ATRs.
arXiv Detail & Related papers (2020-04-13T21:44:54Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.