TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection
- URL: http://arxiv.org/abs/2601.05446v1
- Date: Fri, 09 Jan 2026 00:27:18 GMT
- Title: TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection
- Authors: Hongyang Xie, Hongyang He, Victor Sanchez,
- Abstract summary: Infrared small target detection (ISTD) remains a long-standing challenge due to weak signal contrast, limited spatial extent, and cluttered backgrounds.<n>Current models lack a mechanism to trace how small targets trigger directional, layer-wise perturbations in the feature space.<n>We propose the Trajectory-Aware Mamba Propagation Network (TAPM-Net), which explicitly models the spatial diffusion behavior of target-induced feature disturbances.<n>Experiments on NUAA-SIRST and IRSTD-1K demonstrate that TAPM-Net achieves state-of-the-art performance in ISTD.
- Score: 12.326502890179107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Infrared small target detection (ISTD) remains a long-standing challenge due to weak signal contrast, limited spatial extent, and cluttered backgrounds. Despite performance improvements from convolutional neural networks (CNNs) and Vision Transformers (ViTs), current models lack a mechanism to trace how small targets trigger directional, layer-wise perturbations in the feature space, which is an essential cue for distinguishing signal from structured noise in infrared scenes. To address this limitation, we propose the Trajectory-Aware Mamba Propagation Network (TAPM-Net), which explicitly models the spatial diffusion behavior of target-induced feature disturbances. TAPM-Net is built upon two novel components: a Perturbation-guided Path Module (PGM) and a Trajectory-Aware State Block (TASB). The PGM constructs perturbation energy fields from multi-level features and extracts gradient-following feature trajectories that reflect the directionality of local responses. The resulting feature trajectories are fed into the TASB, a Mamba-based state-space unit that models dynamic propagation along each trajectory while incorporating velocity-constrained diffusion and semantically aligned feature fusion from word-level and sentence-level embeddings. Unlike existing attention-based methods, TAPM-Net enables anisotropic, context-sensitive state transitions along spatial trajectories while maintaining global coherence at low computational cost. Experiments on NUAA-SIRST and IRSTD-1K demonstrate that TAPM-Net achieves state-of-the-art performance in ISTD.
Related papers
- TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection [25.15274799496491]
Moving small target detection (IR) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and infrared-based search system.<n> Accurate-temporal feature modeling is crucial for moving target detection, typically achieved through either temporal differences ortemporal (3D) convolutions.<n>In this paper, we propose a novel moving IRSNet, which effectively extracts and enhancestemporal features for accurate target detection.
arXiv Detail & Related papers (2025-11-11T09:46:43Z) - PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement [63.007237197267834]
Existing deep learning methods are mostly physiological monitoring and lack theoretical robustness.<n>We propose a physics-informed r paradigm derived from the Navier-Stokes equations of hemodynamics, showing that the pulse signal follows a second-order system.<n>This provides a theoretical justification for using a Temporal Conal Network (TCN)<n>Phase-Net achieves state-of-the-art performance with strong efficiency, offering a theoretically grounded and deployment-ready r solution.
arXiv Detail & Related papers (2025-09-29T14:36:45Z) - Graph Enhanced Trajectory Anomaly Detection [23.8160784400789]
Trajectory anomaly detection is essential for identifying unusual and unexpected movement patterns in applications ranging from intelligent transportation systems to urban safety and fraud prevention.<n>Existing methods only consider limited aspects of the trajectory nature and its movement space by treating trajectories as sequences of sampled locations.<n>The proposed Graph Enhanced Trajectory Anomaly Detection framework tightly integrates road network topology, segment semantics, and historical travel patterns to model trajectory data.
arXiv Detail & Related papers (2025-09-22T20:15:15Z) - Bidirectional Feature-aligned Motion Transformation for Efficient Dynamic Point Cloud Compression [97.66080040613726]
We propose a Bidirectional Feature-aligned Motion Transformation (Bi-FMT) framework that implicitly models motion in the feature space.<n>Bi-FMT aligns features across both past and future frames to produce temporally consistent latent representations.<n>We show Bi-FMT surpasses D-DPCC and AdaDPCC in both compression efficiency and runtime.
arXiv Detail & Related papers (2025-09-18T03:51:06Z) - Temporal Point-Supervised Signal Reconstruction: A Human-Annotation-Free Framework for Weak Moving Target Detection [1.187456026346823]
We propose a novel Temporal Point-Supervised (TPS) framework that enables high-performance detection of weak targets without any manual annotations.<n>A Temporal Signal Reconstruction Network (TSRNet) is developed under the TPS paradigm to reconstruct these transient signals.<n>Extensive experiments on a purpose-built low-SNR dataset demonstrate that our framework outperforms state-of-the-art methods while requiring no human annotations.
arXiv Detail & Related papers (2025-07-23T09:02:09Z) - Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection [15.114540113830522]
Infrared dim and small target detection presents a significant challenge due to dynamic multi-frame scenarios and weak target signatures.<n>Traditional low-rank plus sparse models often fail to capture dynamic backgrounds and global spatial-temporal correlations.<n>We propose a novel motion-enhanced nonlocal similarity implicit neural representation framework to address these challenges.
arXiv Detail & Related papers (2025-04-22T07:42:00Z) - DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing [40.607660968380394]
Moving object detection (MOD) in remote sensing is significantly challenged by low resolution, extremely small object sizes, and complex noise interference.<n>Current deep learning-based MOD methods rely on probability density estimation, which restricts flexible information interaction between objects.<n>We propose a point-based MOD in remote sensing that iteratively recovers moving object centers from sparse noisy points.
arXiv Detail & Related papers (2025-04-14T14:44:52Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Uncovering the Missing Pattern: Unified Framework Towards Trajectory
Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences.
Current methods often assume that the observed sequences are complete while ignoring the potential for missing values.
This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z) - Neural Motion Fields: Encoding Grasp Trajectories as Implicit Value
Functions [65.84090965167535]
We present Neural Motion Fields, a novel object representation which encodes both object point clouds and the relative task trajectories as an implicit value function parameterized by a neural network.
This object-centric representation models a continuous distribution over the SE(3) space and allows us to perform grasping reactively by leveraging sampling-based MPC to optimize this value function.
arXiv Detail & Related papers (2022-06-29T18:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.