Related papers: AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting

AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting

URL: http://arxiv.org/abs/2602.22073v1
Date: Wed, 25 Feb 2026 16:24:48 GMT
Title: AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting
Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés,
Abstract summary: Event Spotting is a key task for applications in sports analytics, robotics, and autonomous systems.<n>bfAdaSpot achieves state-of-the-art performance under strict evaluation metrics.
Score: 59.31340724915079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precise Event Spotting aims to localize fast-paced actions or events in videos with high temporal precision, a key task for applications in sports analytics, robotics, and autonomous systems. Existing methods typically process all frames uniformly, overlooking the inherent spatio-temporal redundancy in video data. This leads to redundant computation on non-informative regions while limiting overall efficiency. To remain tractable, they often spatially downsample inputs, losing fine-grained details crucial for precise localization. To address these limitations, we propose \textbf{AdaSpot}, a simple yet effective framework that processes low-resolution videos to extract global task-relevant features while adaptively selecting the most informative region-of-interest in each frame for high-resolution processing. The selection is performed via an unsupervised, task-aware strategy that maintains spatio-temporal consistency across frames and avoids the training instability of learnable alternatives. This design preserves essential fine-grained visual cues with a marginal computational overhead compared to low-resolution-only baselines, while remaining far more efficient than uniform high-resolution processing. Experiments on standard PES benchmarks demonstrate that \textbf{AdaSpot} achieves state-of-the-art performance under strict evaluation metrics (\eg, $+3.96$ and $+2.26$ mAP$@0$ frames on Tennis and FineDiving), while also maintaining strong results under looser metrics. Code is available at: \href{https://github.com/arturxe2/AdaSpot}{https://github.com/arturxe2/AdaSpot}.

Related papers

CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation [6.478456907626643]
Cross-Layer Independent Deformable Description (CLIDD) is a method that achieves superior distinctiveness by sampling directly from independent feature hierarchies.<n>To ensure real-time performance, we implement a hardware-aware kernel fusion strategy.<n>We develop a scalable framework that integrates lightweight architectures with a training protocol.
arXiv Detail & Related papers (2026-01-14T07:03:01Z)
PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z)
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization [50.75654397516163]
We propose RelayFormer, a unified framework that adapts to varying resolutions and modalities.<n> RelayFormer partitions inputs into fixed-size sub-images and introduces Global-Local Relay (GLR) tokens.<n>This enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts.
arXiv Detail & Related papers (2025-08-13T03:35:28Z)
AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity [9.63873831179673]
Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase.<n>We propose textbfAnchorAttention, a difference-aware, dynamic sparse attention mechanism that efficiently identifies critical attention regions.<n>With its finer-grained sparsity strategy, textbfAnchorAttention achieves higher sparsity rates at the same recall level, significantly reducing computation time.
arXiv Detail & Related papers (2025-05-29T14:59:06Z)
Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling [13.283434521851998]
Event cameras offer high temporal resolution and power efficiency, making them well-suited for edge AI applications.<n>Subsampling methods provide a practical solution, but their effect on downstream visual tasks remains underexplored.<n>We evaluate six hardware-friendly subsampling methods for event video classification on various benchmark datasets.
arXiv Detail & Related papers (2025-05-27T13:37:08Z)
Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach [32.91982063297922]
We propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack.<n>The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments.<n>Our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones.
arXiv Detail & Related papers (2025-05-19T09:37:23Z)
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition [82.75714185083383]
This paper investigates the phenomenon of data redundancy in video understanding, with the aim to improve computational efficiency.<n>Motivated by this phenomenon, we introduce a spatially adaptive video recognition approach, termed AdaFocus.<n>Our resulting framework, Uni-AdaFocus, establishes a comprehensive framework that integrates seamlessly spatial, temporal, and sample-wise dynamic computation.
arXiv Detail & Related papers (2024-12-15T15:51:44Z)
Learning to Estimate Hidden Motions with Global Motion Aggregation [71.12650817490318]
Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences. We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image. We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
arXiv Detail & Related papers (2021-04-06T10:32:03Z)
Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes. In this paper, we focus on single-layer architectures for representation learning from event data. We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP) It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information. We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.