Related papers: Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization

Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization

URL: http://arxiv.org/abs/2511.03943v3
Date: Fri, 14 Nov 2025 01:33:42 GMT
Title: Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization
Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma,
Abstract summary: Temporal action localization requires both precise boundary detection and computational efficiency.<n>We address this through two complementary innovations: Boundary Distance Regression (BDR) and Adaptive Temporal Refinement (ATR)<n>On THUMOS14, our method achieves 56.5% mAP@0.7 with 151G FLOPs, using 36% fewer FLOPs than ActionFormer++ (55.7% mAP@0.7 at 235G)
Score: 6.908972852063454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Temporal action localization requires both precise boundary detection and computational efficiency. Current methods apply uniform computation across all temporal positions, wasting resources on easy boundaries while struggling with ambiguous ones. We address this through two complementary innovations: Boundary Distance Regression (BDR), which replaces classification-based boundary detection with signed-distance regression achieving 3.3--16.7$\times$ lower variance; and Adaptive Temporal Refinement (ATR), which allocates transformer depth continuously ($τ\in[0,1]$) to concentrate computation near difficult boundaries. On THUMOS14, our method achieves 56.5\% mAP@0.7 and 58.2\% average mAP@[0.3:0.7] with 151G FLOPs, using 36\% fewer FLOPs than ActionFormer++ (55.7\% mAP@0.7 at 235G). Compared to uniform baselines, we achieve +2.9\% mAP@0.7 (+1.8\% avg mAP, 5.4\% relative) with 24\% fewer FLOPs and 29\% lower latency, with particularly strong gains on short actions (+4.2\%, 8.6\% relative). Training requires 1.29$\times$ baseline FLOPs, but this one-time cost is amortized over many inference runs; knowledge distillation further reduces this to 1.1$\times$ while retaining 99.5\% accuracy. Our contributions include: (i) a theoretically-grounded distance formulation with information-theoretic analysis showing optimal variance scaling; (ii) a continuous depth allocation mechanism avoiding discrete routing complexity; and (iii) consistent improvements across four datasets with gains correlating with boundary heterogeneity.

Related papers

Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z)
Noise-Adaptive Quantum Circuit Mapping for Multi-Chip NISQ Systems via Deep Reinforcement Learning [0.0]
We present DeepQMap, a deep reinforcement learning framework that integrates a bidirectional Long Short-Term Memory based Dynamic Noise Adaptation network.<n>Our method continuously adapts to hardware dynamics through learned temporal representations of quantum system behavior.<n>DeepQMap achieves mean circuit fidelity of $0.920 pm 0.023$, representing a statistically significant 49.3% improvement over state-of-the-art QUBO methods.
arXiv Detail & Related papers (2025-11-22T14:27:55Z)
Identity-Link IRT for Label-Free LLM Evaluation: Preserving Additivity in TVD-MI Scores [3.959606869996232]
We show that averaging TVD-MI's binary trials yields centered-probability scores with additive structure suitable for item-response theory (IRT) without nonlinear link functions.<n>We derive this clipped-linear evaluations from Gini entropy, yielding a box-constrained least-squares formulation that handles boundary saturation.
arXiv Detail & Related papers (2025-10-16T17:59:25Z)
Environment-Aware Indoor LoRaWAN Path Loss: Parametric Regression Comparisons, Shadow Fading, and Calibrated Fade Margins [3.776919981139063]
Indoor LoRaWAN propagation is shaped by structural and time-varying context factors.<n>We present an environment-aware, statistically disciplined path loss framework evaluated using leakage-safe cross-validation.
arXiv Detail & Related papers (2025-10-05T20:14:48Z)
DiffusionNFT: Online Diffusion Reinforcement with Forward Process [99.94852379720153]
Diffusion Negative-aware FineTuning (DiffusionNFT) is a new online RL paradigm that optimize diffusion models directly on the forward process via flow matching.<n>DiffusionNFT is up to $25times$ more efficient than FlowGRPO in head-to-head comparisons, while being CFG-free.
arXiv Detail & Related papers (2025-09-19T16:09:33Z)
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration [17.190325630307097]
We propose an Exponential-Decay Free-Form Deformation Network (EDFFDNet), which employs free-form deformation with an exponential-decay basis function.<n>By transforming dense interactions into sparse ones, ASMA reduces parameters and improves accuracy.<n>Experiments demonstrate that EDFFDNet reduces parameters, memory, and total runtime by 70.5%, 32.6%, and 33.7%, respectively.<n>EDFFDNet-2 further improves PSNR by 1.06 dB while maintaining lower computational costs.
arXiv Detail & Related papers (2025-09-09T12:30:51Z)
SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching [16.683203139962153]
We introduce the Semantic and Geometric-aware Descriptor Network (SGAD), which fundamentally rethinks area-based matching.<n>SGAD generates highly discriminative area descriptors that enable direct matching without complex graph optimization.<n>We further improve the performance of area matching through a novel supervision strategy that decomposes the area matching task into classification and ranking subtasks.
arXiv Detail & Related papers (2025-08-04T10:46:53Z)
Advanced Deep Learning Techniques for Automated Segmentation of Type B Aortic Dissections [4.545298205355719]
We developed four deep learning-based pipelines for Type B aortic dissection segmentation.<n>Our approach achieved superior segmentation accuracy, with Dice Coefficients of 0.91 $pm$ 0.07 for TL, 0.88 $pm$ 0.18 for FL, and 0.47 $pm$ 0.25 for.
arXiv Detail & Related papers (2025-06-27T13:38:33Z)
Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end.<n> APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.<n>A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
AdaGC: Improving Training Stability for Large Language Model Pretraining [18.163318397205533]
Large LanguageText Models (LLMs) face increasing loss spikes during scaling.<n>While global clipping mitigates this, traditional approaches mitigate specific variations.<n>We show that AdaGC converges 25% faster than global clipping.
arXiv Detail & Related papers (2025-02-16T08:13:23Z)
KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving [2.382388777981433]
This paper introduces the KAN-RCBEVDepth method to enhance 3D object detection in autonomous driving. Our unique Bird's Eye View-based approach significantly improves detection accuracy and efficiency. The code will be released in urlhttps://www.laitiamo.com/laitiamo/RCBEVDepth-KAN.
arXiv Detail & Related papers (2024-08-04T16:54:49Z)
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation. Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z)
Integral Continual Learning Along the Tangent Vector Field of Tasks [112.02761912526734]
We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally. It maintains a small fixed-size memory buffer, as low as 0.4% of the source datasets, which is updated by simple resampling. Our method achieves strong performance across various buffer sizes for different datasets.
arXiv Detail & Related papers (2022-11-23T16:49:26Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation [68.25872110275542]
We propose an efficient inference procedure for non-autoregressive machine translation. It iteratively refines translation purely in the continuous space. We evaluate our approach on WMT'14 En-De, WMT'16 Ro-En and IWSLT'16 De-En.
arXiv Detail & Related papers (2020-09-15T15:30:14Z)
Second-Order Provable Defenses against Adversarial Attacks [63.34032156196848]
We show that if the eigenvalues of the network are bounded, we can compute a certificate in the $l$ norm efficiently using convex optimization. We achieve certified accuracy of 5.78%, and 44.96%, and 43.19% on 2,59% and 4BP-based methods respectively.
arXiv Detail & Related papers (2020-06-01T05:55:18Z)
ScopeFlow: Dynamic Scene Scoping for Optical Flow [94.42139459221784]
We propose to modify the common training protocols of optical flow. The improvement is based on observing the bias in sampling challenging data. We find that both regularization and augmentation should decrease during the training protocol.
arXiv Detail & Related papers (2020-02-25T09:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.