Towards Frame Rate Agnostic Multi-Object Tracking
- URL: http://arxiv.org/abs/2209.11404v3
- Date: Tue, 18 Apr 2023 02:15:17 GMT
- Title: Towards Frame Rate Agnostic Multi-Object Tracking
- Authors: Weitao Feng and Lei Bai and Yongqiang Yao and Fengwei Yu and Wanli
Ouyang
- Abstract summary: We propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.
Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information.
FAPS reflects all post-processing steps in training via tracking pattern matching and fusion.
- Score: 76.82407173177138
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-Object Tracking (MOT) is one of the most fundamental computer vision
tasks that contributes to various video analysis applications. Despite the
recent promising progress, current MOT research is still limited to a fixed
sampling frame rate of the input stream. In fact, we empirically found that the
accuracy of all recent state-of-the-art trackers drops dramatically when the
input frame rate changes. For a more intelligent tracking solution, we shift
the attention of our research work to the problem of Frame Rate Agnostic MOT
(FraMOT), which takes frame rate insensitivity into consideration. In this
paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training
Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we
propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes
the frame rate information to aid identity matching across multi-frame-rate
inputs, improving the capability of the learned model in handling complex
motion-appearance relations in FraMOT. Moreover, the association gap between
training and inference is enlarged in FraMOT because those post-processing
steps not included in training make a larger difference in lower frame rate
scenarios. To address it, we propose Periodic Training Scheme (PTS) to reflect
all post-processing steps in training via tracking pattern matching and fusion.
Along with the proposed approaches, we make the first attempt to establish an
evaluation method for this new task of FraMOT in two different modes, i.e.,
known frame rate and unknown frame rate, aiming to handle a more complex
situation. The quantitative experiments on the challenging MOT17/20 dataset
(FraMOT version) have clearly demonstrated that the proposed approaches can
handle different frame rates better and thus improve the robustness against
complicated scenarios.
Related papers
- Self-supervised Learning of Event-guided Video Frame Interpolation for
Rolling Shutter Frames [6.62974666987451]
This paper makes the first attempt to tackle the challenging task of recovering arbitrary frame rate latent global shutter (GS) frames from two consecutive rolling shutter (RS) frames.
We propose a novel self-supervised framework that leverages events to guide RS frame correction VFI in a unified framework.
arXiv Detail & Related papers (2023-06-27T14:30:25Z) - Frame-Event Alignment and Fusion Network for High Frame Rate Tracking [37.35823883499189]
Most existing RGB-based trackers target low frame rate benchmarks of around 30 frames per second.
We propose an end-to-end network consisting of multi-modality alignment and fusion modules.
With the FE240hz dataset, our approach achieves high frame rate tracking up to 240Hz.
arXiv Detail & Related papers (2023-05-25T03:34:24Z) - Video Frame Interpolation with Densely Queried Bilateral Correlation [52.823751291070906]
Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames.
Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames.
We propose Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem.
arXiv Detail & Related papers (2023-04-26T14:45:09Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - E-VFIA : Event-Based Video Frame Interpolation with Attention [8.93294761619288]
We propose an event-based video frame with attention (E-VFIA) as a lightweight kernel-based method.
E-VFIA fuses event information with standard video frames by deformable convolutions to generate high quality interpolated frames.
The proposed method represents events with high temporal resolution and uses a multi-head self-attention mechanism to better encode event-based information.
arXiv Detail & Related papers (2022-09-19T21:40:32Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.