RGB Stream Is Enough for Temporal Action Detection
- URL: http://arxiv.org/abs/2107.04362v1
- Date: Fri, 9 Jul 2021 11:10:11 GMT
- Title: RGB Stream Is Enough for Temporal Action Detection
- Authors: Chenhao Wang, Hongxiang Cai, Yuxin Zou, Yichao Xiong
- Abstract summary: State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow.
optical flow is a hand-designed representation which not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow.
We argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation is the key solution to avoid performance degradation when optical flow is removed.
- Score: 3.2689702143620147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art temporal action detectors to date are based on two-stream
input including RGB frames and optical flow. Although combining RGB frames and
optical flow boosts performance significantly, optical flow is a hand-designed
representation which not only requires heavy computation, but also makes it
methodologically unsatisfactory that two-stream methods are often not learned
end-to-end jointly with the flow. In this paper, we argue that optical flow is
dispensable in high-accuracy temporal action detection and image level data
augmentation (ILDA) is the key solution to avoid performance degradation when
optical flow is removed. To evaluate the effectiveness of ILDA, we design a
simple yet efficient one-stage temporal action detector based on single RGB
stream named DaoTAD. Our results show that when trained with ILDA, DaoTAD has
comparable accuracy with all existing state-of-the-art two-stream detectors
while surpassing the inference speed of previous methods by a large margin and
the inference speed is astounding 6668 fps on GeForce GTX 1080 Ti. Code is
available at \url{https://github.com/Media-Smart/vedatad}.
Related papers
- BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation [9.216990457540941]
Event cameras deliver visual information characterized by a high dynamic range and high temporal resolution.
Current advanced optical flow methods for event cameras largely adopt established image-based frameworks.
We present BAT, an innovative framework that estimates event-based optical flow using bidirectional adaptive temporal correlation.
arXiv Detail & Related papers (2025-03-05T08:20:16Z) - StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video
Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation.
We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks.
StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z) - Towards Anytime Optical Flow Estimation with Event Cameras [35.685866753715416]
Event cameras are capable of responding to log-brightness changes in microseconds.
Existing datasets collected via event cameras provide limited frame rate optical flow ground truth.
We propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow.
arXiv Detail & Related papers (2023-07-11T06:15:12Z) - Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation.
We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - TadML: A fast temporal action detection with Mechanics-MLP [0.5148939336441986]
Temporal Action Detection (TAD) is a crucial but challenging task in video understanding.
Most current models adopt both RGB and Optical-Flow streams for the TAD task.
We propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established.
arXiv Detail & Related papers (2022-06-07T04:07:48Z) - hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based
Optical Flow [0.0]
Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene.
Existing solutions for calculating optical flow from event data fail to capture the true direction of motion due to the aperture problem.
We present a hardware realization of the fARMS algorithm allowing for real-time computation of true flow on low-power, embedded platforms.
arXiv Detail & Related papers (2021-12-13T16:27:17Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Unsupervised Motion Representation Enhanced Network for Action
Recognition [4.42249337449125]
Motion representation between consecutive frames has proven to have great promotion to video understanding.
TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow.
We propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator.
arXiv Detail & Related papers (2021-03-05T04:14:32Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - PAN: Towards Fast Action Recognition via Learning Persistence of
Appearance [60.75488333935592]
Most state-of-the-art methods heavily rely on dense optical flow as motion representation.
In this paper, we shed light on fast action recognition by lifting the reliance on optical flow.
We design a novel motion cue called Persistence of Appearance (PA)
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
arXiv Detail & Related papers (2020-08-08T07:09:54Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.