Related papers: MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation

MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation

URL: http://arxiv.org/abs/2411.09551v1
Date: Thu, 14 Nov 2024 16:06:10 GMT
Title: MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation
Authors: Jonas Serych, Michal Neoral, Jiri Matas,
Abstract summary: We present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications.
Score: 22.245299107036836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework to address challenges in point-level visual tracking in video sequences. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. This decoupling significantly enhances the accuracy and flexibility of the tracking process, allowing MFTIQ to maintain reliable trajectory predictions even in scenarios of prolonged occlusions and complex dynamics. Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications. Experimental validations on the TAP-Vid Davis dataset show that MFTIQ with RoMa optical flow not only surpasses MFT but also performs comparably to state-of-the-art trackers while having substantially faster processing speed. Code and models available at https://github.com/serycjon/MFTIQ .

Related papers

CoWTracker: Tracking by Warping instead of Correlation [53.834673070954494]
We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
arXiv Detail & Related papers (2026-02-04T18:58:59Z)
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits [29.853857313543468]
Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs)<n>We investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs)<n>Our framework, named MTPC, allows one to explore different ways to encode the joint distributions over future tokens.
arXiv Detail & Related papers (2025-11-14T14:33:14Z)
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction [11.691960175716163]
This paper introduces FastMTP, a method that improves multi-step draft quality by aligning MTP training with its inference pattern.<n>Our approach fine-tunes a single MTP head with position-shared weights on self-distilled data, enabling it to capture dependencies among consecutive future tokens.<n> Experimental results across seven diverse benchmarks demonstrate that FastMTP achieves an average of 2.03x speedup compared to standard next token prediction.
arXiv Detail & Related papers (2025-09-16T07:36:26Z)
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality [52.948791050405525]
We propose SimMLM, a simple yet powerful framework for multimodal learning with missing modalities.<n>SimMLM consists of a generic Dynamic Mixture of Modality Experts (DMoME) architecture, featuring a dynamic, learnable gating mechanism.<n>Key innovation of SimMLM is the proposed More vs. Fewer (MoFe) ranking loss, which ensures that task accuracy improves or remains stable as more modalities are made available.
arXiv Detail & Related papers (2025-07-25T13:39:34Z)
Multi-Scale Finetuning for Encoder-based Time Series Foundation Models [67.95907033226585]
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting.<n>While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities.<n>We propose Multiscale finetuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process.
arXiv Detail & Related papers (2025-06-17T01:06:01Z)
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
Temporal Query Network for Efficient Multivariate Time Series Forecasting [3.0838061121585616]
We propose a novel technique called Temporal Query (TQ) to more effectively capture multivariate correlations.<n>Building upon the TQ technique, we develop a simple yet efficient model named Temporal Query Network (TQNet)<n>Experiments demonstrate that TQNet learns more robust multivariate correlations, achieving state-of-the-art forecasting accuracy across 12 challenging real-world datasets.
arXiv Detail & Related papers (2025-05-19T09:55:10Z)
Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization [55.09893295671917]
This paper introduces a novel Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) The GRMP-IQA comprises two key modules: Meta-Prompt Pre-training Module and Quality-Aware Gradient Regularization. Experiments on five standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting.
arXiv Detail & Related papers (2024-09-09T07:26:21Z)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer [66.71930982549028]
Vision-Language Transformers (VLTs) have shown great success recently, but are accompanied by heavy computation costs. We propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs.
arXiv Detail & Related papers (2024-03-05T14:13:50Z)
Dense Matchers for Dense Tracking [0.0]
This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance. This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy.
arXiv Detail & Related papers (2024-02-17T14:16:14Z)
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning [28.12788291168137]
We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
arXiv Detail & Related papers (2023-11-04T02:22:40Z)
GAFlow: Incorporating Gaussian Attention into Optical Flow [62.646389181507764]
We push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning. We introduce a novel Gaussian-Constrained Layer (GCL) which can be easily plugged into existing Transformer blocks. For reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM)
arXiv Detail & Related papers (2023-09-28T07:46:01Z)
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL) We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z)
MFT: Long-Term Tracking of Every Pixel [0.36832029288386137]
Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking. Method exploits optical flows estimated between consecutive frames. Tracks densely orders of magnitude faster than state-of-the-art point-tracking methods.
arXiv Detail & Related papers (2023-05-22T13:02:46Z)
Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation and Convergence [83.58839320635956]
Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner. Recent FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets. This paper addresses how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks.
arXiv Detail & Related papers (2023-03-23T02:42:10Z)
A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping [8.277567852741242]
We propose combining efficient operations of deep neural networks with a novel mixed quantization scheme to construct a well-performing but computationally efficient mixed quantization network (MQN) MQN provides up to 10 times improvement on latency and 25 times improvement on memory consumption.
arXiv Detail & Related papers (2022-03-12T19:40:01Z)
GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation. It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.