TadML: A fast temporal action detection with Mechanics-MLP
- URL: http://arxiv.org/abs/2206.02997v2
- Date: Fri, 2 Feb 2024 17:11:10 GMT
- Title: TadML: A fast temporal action detection with Mechanics-MLP
- Authors: Bowen Deng and Dongchang Liu
- Abstract summary: Temporal Action Detection (TAD) is a crucial but challenging task in video understanding.
Most current models adopt both RGB and Optical-Flow streams for the TAD task.
We propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established.
- Score: 0.5148939336441986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal Action Detection(TAD) is a crucial but challenging task in video
understanding.It is aimed at detecting both the type and start-end frame for
each action instance in a long, untrimmed video.Most current models adopt both
RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must
be converted manually into Optical-Flow frames with additional computation and
time cost, which is an obstacle to achieve real-time processing. At present,
many models adopt two-stage strategies, which would slow the inference speed
down and complicatedly tuning on proposals generating.By comparison, we propose
a one-stage anchor-free temporal localization method with RGB stream only, in
which a novel Newtonian Mechanics-MLP architecture is established. It has
comparable accuracy with all existing state-of-the-art models, while surpasses
the inference speed of these methods by a large margin. The typical inference
speed in this paper is astounding 4.44 video per second on THUMOS14. In
applications, because there is no need to convert optical flow, the inference
speed will be faster.It also proves that MLP has great potential in downstream
tasks such as TAD. The source code is available at
https://github.com/BonedDeng/TadML
Related papers
- MemFlow: Optical Flow Estimation and Prediction with Memory [54.22820729477756]
We present MemFlow, a real-time method for optical flow estimation and prediction with memory.
Our method enables memory read-out and update modules for aggregating historical motion information in real-time.
Our approach seamlessly extends to the future prediction of optical flow based on past observations.
arXiv Detail & Related papers (2024-04-07T04:56:58Z) - ATCA: an Arc Trajectory Based Model with Curvature Attention for Video
Frame Interpolation [10.369068266836154]
We propose an arc trajectory based model (ATCA) which learns motion prior to only two consecutive frames and also is lightweight.
Experiments show that our approach performs better than many SOTA methods with fewer parameters and faster inference speed.
arXiv Detail & Related papers (2022-08-01T13:42:08Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - RGB Stream Is Enough for Temporal Action Detection [3.2689702143620147]
State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow.
optical flow is a hand-designed representation which not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow.
We argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation is the key solution to avoid performance degradation when optical flow is removed.
arXiv Detail & Related papers (2021-07-09T11:10:11Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - PAN: Towards Fast Action Recognition via Learning Persistence of
Appearance [60.75488333935592]
Most state-of-the-art methods heavily rely on dense optical flow as motion representation.
In this paper, we shed light on fast action recognition by lifting the reliance on optical flow.
We design a novel motion cue called Persistence of Appearance (PA)
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
arXiv Detail & Related papers (2020-08-08T07:09:54Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.