Related papers: MFT: Long-Term Tracking of Every Pixel

MFT: Long-Term Tracking of Every Pixel

URL: http://arxiv.org/abs/2305.12998v2
Date: Fri, 10 Nov 2023 16:21:10 GMT
Title: MFT: Long-Term Tracking of Every Pixel
Authors: Michal Neoral, Jon\'a\v{s} \v{S}er\'ych, Ji\v{r}\'i Matas
Abstract summary: Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking. Method exploits optical flows estimated between consecutive frames. Tracks densely orders of magnitude faster than state-of-the-art point-tracking methods.
Score: 0.36832029288386137
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose MFT -- Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking. The approach exploits optical flows estimated not only between consecutive frames, but also for pairs of frames at logarithmically spaced intervals. It selects the most reliable sequence of flows on the basis of estimates of its geometric accuracy and the probability of occlusion, both provided by a pre-trained CNN. We show that MFT achieves competitive performance on the TAP-Vid benchmark, outperforming baselines by a significant margin, and tracking densely orders of magnitude faster than the state-of-the-art point-tracking methods. The method is insensitive to medium-length occlusions and it is robustified by estimating flow with respect to the reference frame, which reduces drift.

Related papers

CoWTracker: Tracking by Warping instead of Correlation [53.834673070954494]
We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
arXiv Detail & Related papers (2026-02-04T18:58:59Z)
Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video. Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one. We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z)
Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry [7.517597541959445]
We introduce S-Temporal Visual Odometry (STVO), a novel deep network architecture to enhance accuracy and consistency of multi-frame flow matching. Our STVO achieves state-the-art performance on ETH3D benchmark and 38.9% on KITTI Odometry benchmark over the previous best methods.
arXiv Detail & Related papers (2024-12-22T08:47:13Z)
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z)
Dense Matchers for Dense Tracking [0.0]
This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance. This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy.
arXiv Detail & Related papers (2024-02-17T14:16:14Z)
Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z)
Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z)
StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation. We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks. StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z)
Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion. This approach is not optimal when the temporal distance between the input sequence increases. We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z)
Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning. We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
Unsupervised Motion Representation Enhanced Network for Action Recognition [4.42249337449125]
Motion representation between consecutive frames has proven to have great promotion to video understanding. TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow. We propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator.
arXiv Detail & Related papers (2021-03-05T04:14:32Z)
FlowMOT: 3D Multi-Object Tracking by Scene Flow Association [9.480272707157747]
We propose a LiDAR-based 3D MOT framework named FlowMOT, which integrates point-wise motion information with the traditional matching algorithm. Our approach outperforms recent end-to-end methods and achieves competitive performance with the state-of-the-art filter-based method.
arXiv Detail & Related papers (2020-12-14T14:03:48Z)
STaRFlow: A SpatioTemporal Recurrent Cell for Lightweight Multi-Frame Optical Flow Estimation [64.99259320624148]
We present a new lightweight CNN-based algorithm for multi-frame optical flow estimation. The resulting STaRFlow algorithm gives state-of-the-art performances on MPI Sintel and Kitti2015.
arXiv Detail & Related papers (2020-07-10T17:01:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.