Motion Estimation for Multi-Object Tracking using KalmanNet with Semantic-Independent Encoding
- URL: http://arxiv.org/abs/2509.11323v1
- Date: Sun, 14 Sep 2025 15:57:46 GMT
- Title: Motion Estimation for Multi-Object Tracking using KalmanNet with Semantic-Independent Encoding
- Authors: Jian Song, Wei Mei, Yunfeng Xu, Qiang Fu, Renke Kou, Lina Bu, Yucheng Long,
- Abstract summary: Motion estimation is a crucial component in multi-object tracking (MOT)<n>In this work, we utilize the learning-aided filter to handle the motion estimation of MOT.<n>We propose a novel method named Semantic-Independent KalmanNet (SIKNet)
- Score: 14.822887770402707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion estimation is a crucial component in multi-object tracking (MOT). It predicts the trajectory of objects by analyzing the changes in their positions in consecutive frames of images, reducing tracking failures and identity switches. The Kalman filter (KF) based on the linear constant-velocity model is one of the most commonly used methods in MOT. However, it may yield unsatisfactory results when KF's parameters are mismatched and objects move in non-stationary. In this work, we utilize the learning-aided filter to handle the motion estimation of MOT. In particular, we propose a novel method named Semantic-Independent KalmanNet (SIKNet), which encodes the state vector (the input feature) using a Semantic-Independent Encoder (SIE) by two steps. First, the SIE uses a 1D convolution with a kernel size of 1, which convolves along the dimension of homogeneous-semantic elements across different state vectors to encode independent semantic information. Then it employs a fully-connected layer and a nonlinear activation layer to encode nonlinear and cross-dependency information between heterogeneous-semantic elements. To independently evaluate the performance of the motion estimation module in MOT, we constructed a large-scale semi-simulated dataset from several open-source MOT datasets. Experimental results demonstrate that the proposed SIKNet outperforms the traditional KF and achieves superior robustness and accuracy than existing learning-aided filters. The code is available at (https://github.com/SongJgit/filternet and https://github.com/SongJgit/TBDTracker).
Related papers
- DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking [50.038098341549095]
State estimation is challenging for 3D object tracking with high maneuverability.<n>We propose a novel framework, DIMM, to effectively combine estimates from different motion models in each direction.<n>DIMM significantly improves the tracking accuracy of existing state estimation methods by 31.61%99.23%.
arXiv Detail & Related papers (2025-05-18T10:12:41Z) - Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking [2.7898966850590625]
We introduce a novel KF-based prediction module called Ego-motion Aware Target Prediction (EMAP)
Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter.
EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively.
arXiv Detail & Related papers (2024-04-03T23:24:25Z) - MotionTrack: End-to-End Transformer-based Multi-Object Tracing with
LiDAR-Camera Fusion [13.125168307241765]
We propose an end-to-end transformer-based MOT algorithm (MotionTrack) with multi-modality sensor inputs to track objects with multiple classes.
The MotionTrack and its variations achieve better results (AMOTA score at 0.55) on the nuScenes dataset compared with other classical baseline models.
arXiv Detail & Related papers (2023-06-29T15:00:12Z) - OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds [6.661881950861012]
We propose a novel one-stream network with the strength of the instance-level encoding, which avoids the correlation operations occurring in previous Siamese network.
The proposed method has achieved considerable performance not only for class-specific tracking but also for class-agnostic tracking with less computation and higher efficiency.
arXiv Detail & Related papers (2022-10-16T12:31:59Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Object Tracking by Detection with Visual and Motion Cues [1.7818230914983044]
Self-driving cars need to detect and track objects in camera images.
We present a simple online tracking algorithm that is based on a constant velocity motion model with a Kalman filter.
We evaluate our approach on the challenging BDD100 dataset.
arXiv Detail & Related papers (2021-01-19T10:29:16Z) - Learning to Generate Content-Aware Dynamic Detectors [62.74209921174237]
We introduce a newpective of designing efficient detectors, which is automatically generating sample-adaptive model architecture.
We introduce a course-to-fine strat-egy tailored for object detection to guide the learning of dynamic routing.
Experiments on MS-COCO dataset demonstrate that CADDet achieves 1.8 higher mAP with 10% fewer FLOPs compared with vanilla routing.
arXiv Detail & Related papers (2020-12-08T08:05:20Z) - Online Multi-Object Tracking and Segmentation with GMPHD Filter and
Mask-based Affinity Fusion [79.87371506464454]
We propose a fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input.
The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model.
In the experiments on the two popular MOTS datasets, the key modules show some improvements.
arXiv Detail & Related papers (2020-08-31T21:06:22Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z) - Probabilistic 3D Multi-Object Tracking for Autonomous Driving [23.036619327925088]
We present our on-line tracking method, which made the first place in the NuScenes Tracking Challenge.
Our method estimates the object states by adopting a Kalman Filter.
Our experimental results on the NuScenes validation and test set show that our method outperforms the AB3DMOT baseline method.
arXiv Detail & Related papers (2020-01-16T06:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.