Dense Matchers for Dense Tracking
- URL: http://arxiv.org/abs/2402.11287v1
- Date: Sat, 17 Feb 2024 14:16:14 GMT
- Title: Dense Matchers for Dense Tracking
- Authors: Tom\'a\v{s} Jel\'inek, Jon\'a\v{s} \v{S}er\'ych, Ji\v{r}\'i Matas
- Abstract summary: This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT.
We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance.
This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optical flow is a useful input for various applications, including 3D
reconstruction, pose estimation, tracking, and structure-from-motion. Despite
its utility, the field of dense long-term tracking, especially over wide
baselines, has not been extensively explored. This paper extends the concept of
combining multiple optical flows over logarithmically spaced intervals as
proposed by MFT. We demonstrate the compatibility of MFT with different optical
flow networks, yielding results that surpass their individual performance.
Moreover, we present a simple yet effective combination of these networks
within the MFT framework. This approach proves to be competitive with more
sophisticated, non-causal methods in terms of position prediction accuracy,
highlighting the potential of MFT in enhancing long-term tracking applications.
Related papers
- CoWTracker: Tracking by Warping instead of Correlation [53.834673070954494]
We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
arXiv Detail & Related papers (2026-02-04T18:58:59Z) - FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z) - Longitudinal Flow Matching for Trajectory Modeling [7.063657100587108]
We propose Interpolative Multi-Marginal Flow Matching (IMMFM), a framework that learns continuous dynamics jointly consistent with multiple observed time points.<n>IMMFM captures intrinsicity, handles irregular sparse sampling, and yields subject-specific trajectories.<n> Experiments on synthetic benchmarks and real-world longitudinal datasets show that IMMFM outperforms existing methods in both forecasting accuracy and further downstream tasks.
arXiv Detail & Related papers (2025-10-03T23:33:50Z) - GeoMM: On Geodesic Perspective for Multi-modal Learning [55.41612200877861]
This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time.<n>Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning.
arXiv Detail & Related papers (2025-05-16T13:12:41Z) - ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration [40.16200204154956]
We propose ToF-Splatting, the first 3D Gaussian Splatting-based SLAM pipeline tailored for using effectively very sparse ToF input data.
Our approach improves upon the state of the art by introducing a multi-frame integration module, which produces dense depth maps by merging cues from extremely sparse ToF depth, monocular color, and multi-view geometry.
arXiv Detail & Related papers (2025-04-23T09:19:43Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.
Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.
We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation [22.245299107036836]
We present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework.
MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations.
Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications.
arXiv Detail & Related papers (2024-11-14T16:06:10Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - Interaction-Force Transport Gradient Flows [45.05400562268213]
This paper presents a new gradient flow dissipation geometry over non-negative and probability measures.
Using a precise connection between the Hellinger geometry and the maximum mean discrepancy (MMD), we propose the interaction-force transport (IFT) gradient flows.
arXiv Detail & Related papers (2024-05-27T11:46:14Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - MFT: Long-Term Tracking of Every Pixel [0.36832029288386137]
Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking.
Method exploits optical flows estimated between consecutive frames.
Tracks densely orders of magnitude faster than state-of-the-art point-tracking methods.
arXiv Detail & Related papers (2023-05-22T13:02:46Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Spatio-Temporal Multi-Flow Network for Video Frame Interpolation [3.6053802212032995]
Video frame (VFI) is a very active research topic, with applications spanning computer vision, post production and video encoding.
We present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture.
arXiv Detail & Related papers (2021-11-30T15:18:46Z) - Sensor-Guided Optical Flow [53.295332513139925]
This paper proposes a framework to guide an optical flow network with external cues to achieve superior accuracy on known or unseen domains.
We show how these can be obtained by combining depth measurements from active sensors with geometry and hand-crafted optical flow algorithms.
arXiv Detail & Related papers (2021-09-30T17:59:57Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Unsupervised Motion Representation Enhanced Network for Action
Recognition [4.42249337449125]
Motion representation between consecutive frames has proven to have great promotion to video understanding.
TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow.
We propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator.
arXiv Detail & Related papers (2021-03-05T04:14:32Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.