STMDNet: A Lightweight Directional Framework for Motion Pattern Recognition of Tiny Targets
- URL: http://arxiv.org/abs/2501.13054v1
- Date: Wed, 22 Jan 2025 18:06:00 GMT
- Title: STMDNet: A Lightweight Directional Framework for Motion Pattern Recognition of Tiny Targets
- Authors: Mingshuo Xu, Hao Luan, Zhou Daniel Hao, Jigen Peng, Shigang Yue,
- Abstract summary: We propose STMDNet, a model-based computational framework to Recognize motions of tiny targets at variable velocities.
We develop the first collaborative directional encoding-decoding strategy that determines the motion direction from only one correlation per spatial location.
Across diverse speeds, STMDNet-F improves mF1 by 19%, 16%, and 8% at 240Hz, 120Hz, and 60Hz, respectively, while STMDNet achieves 87 FPS on a single CPU thread.
- Score: 6.331639152400688
- License:
- Abstract: Recognizing motions of tiny targets - only few dozen pixels - in cluttered backgrounds remains a fundamental challenge when standard feature-based or deep learning methods fail under scarce visual cues. We propose STMDNet, a model-based computational framework to Recognize motions of tiny targets at variable velocities under low-sampling frequency scenarios. STMDNet designs a novel dual-dynamics-and-correlation mechanism, harnessing ipsilateral excitation to integrate target cues and leakage-enhancing-type contralateral inhibition to suppress large-object and background motion interference. Moreover, we develop the first collaborative directional encoding-decoding strategy that determines the motion direction from only one correlation per spatial location, cutting computational costs to one-eighth of prior methods. Further, simply substituting the backbone of a strong STMD model with STMDNet raises AUC by 24%, yielding an enhanced STMDNet-F. Evaluations on real-world low sampling frequency datasets show state-of-the-art results, surpassing the deep learning baseline. Across diverse speeds, STMDNet-F improves mF1 by 19%, 16%, and 8% at 240Hz, 120Hz, and 60Hz, respectively, while STMDNet achieves 87 FPS on a single CPU thread. These advances highlight STMDNet as a next-generation backbone for tiny target motion pattern recognition and underscore its broader potential to revitalize model-based visual approaches in motion detection.
Related papers
- Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection [3.7038542578642724]
We introduce a Neural-represented spatial-temporal model (NeurSTT) for infrared small target detection.
NeurSTT enhances spatial-temporal correlations in background approximation, thereby supporting target detection in an unsupervised manner.
Visual and numerical results across various datasets demonstrate that our method outperforms the suboptimal method on $256 times 256$ sequences.
arXiv Detail & Related papers (2024-12-23T05:46:08Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Enhancing Lightweight Neural Networks for Small Object Detection in IoT
Applications [1.6932009464531739]
The paper proposes a novel adaptive tiling method that can be used on top of any existing object detector.
Our experimental results show that the proposed tiling method can boost the F1-score by up to 225% while reducing the average object count error by up to 76%.
arXiv Detail & Related papers (2023-11-13T08:58:34Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Revisit Geophysical Imaging in A New View of Physics-informed Generative
Adversarial Learning [2.12121796606941]
Full waveform inversion produces high-resolution subsurface models.
FWI with least-squares function suffers from many drawbacks such as the local-minima problem.
Recent works relying on partial differential equations and neural networks show promising performance for two-dimensional FWI.
We propose an unsupervised learning paradigm that integrates wave equation with a discriminate network to accurately estimate the physically consistent models.
arXiv Detail & Related papers (2021-09-23T15:54:40Z) - MotionHint: Self-Supervised Monocular Visual Odometry with Motion
Constraints [70.76761166614511]
We present a novel self-supervised algorithm named MotionHint for monocular visual odometry (VO)
Our MotionHint algorithm can be easily applied to existing open-sourced state-of-the-art SSM-VO systems.
arXiv Detail & Related papers (2021-09-14T15:35:08Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - A Generative Learning Approach for Spatio-temporal Modeling in Connected
Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles.
LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure.
In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z) - A Time-Delay Feedback Neural Network for Discriminating Small,
Fast-Moving Targets in Complex Dynamic Environments [8.645725394832969]
Discriminating small moving objects within complex visual environments is a significant challenge for autonomous micro robots.
We propose an STMD-based neural network with feedback connection (Feedback STMD), where the network output is temporally delayed, then fed back to the lower layers to mediate neural responses.
arXiv Detail & Related papers (2019-12-29T03:10:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.