Related papers: MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition

MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition

URL: http://arxiv.org/abs/2304.08803v1
Date: Tue, 18 Apr 2023 08:07:23 GMT
Title: MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition
Authors: Guoliang Xu, Jianqin Yin
Abstract summary: Group Activity Recognition (GAR) aims to predict the activity category of the group by learning the actor-temporal interaction relation in the group. Previous works mainly learn the interaction relation by the well-designed GCNs or Transformers. In this paper, we design a novel-based method for Actor Interaction Relation learning (MLP-AIR) in GAR.
Score: 4.24515544235173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The task of Group Activity Recognition (GAR) aims to predict the activity category of the group by learning the actor spatial-temporal interaction relation in the group. Therefore, an effective actor relation learning method is crucial for the GAR task. The previous works mainly learn the interaction relation by the well-designed GCNs or Transformers. For example, to infer the actor interaction relation, GCNs need a learnable adjacency, and Transformers need to calculate the self-attention. Although the above methods can model the interaction relation effectively, they also increase the complexity of the model (the number of parameters and computations). In this paper, we design a novel MLP-based method for Actor Interaction Relation learning (MLP-AIR) in GAR. Compared with GCNs and Transformers, our method has a competitive but conceptually and technically simple alternative, significantly reducing the complexity. Specifically, MLP-AIR includes three sub-modules: MLP-based Spatial relation modeling module (MLP-S), MLP-based Temporal relation modeling module (MLP-T), and MLP-based Relation refining module (MLP-R). MLP-S is used to model the spatial relation between different actors in each frame. MLP-T is used to model the temporal relation between different frames for each actor. MLP-R is used further to refine the relation between different dimensions of relation features to improve the feature's expression ability. To evaluate the MLP-AIR, we conduct extensive experiments on two widely used benchmarks, including the Volleyball and Collective Activity datasets. Experimental results demonstrate that MLP-AIR can get competitive results but with low complexity.

Related papers

KAN or MLP: A Fairer Comparison [63.794304207664176]
This paper offers a fairer and more comprehensive comparison of KAN and models across various tasks. We control the number of parameters and FLOPs to compare the performance of KAN and representation. We find that KAN's issue is more severe than that of forgetting in a standard class-incremental continual learning setting.
arXiv Detail & Related papers (2024-07-23T17:43:35Z)
MLPs Learn In-Context on Regression and Classification Tasks [28.13046236900491]
In-context learning (ICL) is often assumed to be a unique hallmark of Transformer models. We demonstrate that multi-layer perceptrons (MLPs) can also learn in-context.
arXiv Detail & Related papers (2024-05-24T15:04:36Z)
R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition [33.53114929452528]
Vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community. We present an achieves a view-based 3D object recognition task by considering the communications between patches from different views. With a conceptually simple structure, our R$2$MLP achieves competitive performance compared with existing methods.
arXiv Detail & Related papers (2022-11-20T21:13:02Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
Using Fitness Dependent Optimizer for Training Multi-layer Perceptron [13.280383503879158]
This study presents a novel training algorithm depending upon the recently proposed Fitness Dependent (FDO) The stability of this algorithm has been verified and performance-proofed in both the exploration and exploitation stages. The proposed approach using FDO as a trainer can outperform the other approaches using different trainers on the dataset.
arXiv Detail & Related papers (2022-01-03T10:23:17Z)
MLP Architectures for Vision-and-Language Modeling: An Empirical Study [91.6393550858739]
We initiate the first empirical study on the use of architectures for vision-and-featured (VL) fusion. We find that without pre-training, usings for multimodal fusion has a noticeable performance gap compared to transformers. Instead of heavy multi-head attention, adding tiny one-head attention to encoders is sufficient to achieve comparable performance to transformers.
arXiv Detail & Related papers (2021-12-08T18:26:19Z)
Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement. The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed. Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z)
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation [40.20599070308035]
Transformer-based models require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. MLP-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. We propose the Multi-Order Interaction layer, which is capable of expressing an arbitrary order of interactions while maintaining the memory and time complexity of the layer.
arXiv Detail & Related papers (2021-08-17T08:38:49Z)
CycleMLP: A MLP-like Architecture for Dense Prediction [26.74203747156439]
CycleMLP is a versatile backbone for visual recognition and dense predictions. It can cope with various image sizes and achieves linear computational complexity to image size by using local windows. CycleMLP aims to provide a competitive baseline on object detection, instance segmentation, and semantic segmentation for models.
arXiv Detail & Related papers (2021-07-21T17:23:06Z)
AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.