AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting
- URL: http://arxiv.org/abs/2602.22073v1
- Date: Wed, 25 Feb 2026 16:24:48 GMT
- Title: AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting
- Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés,
- Abstract summary: Event Spotting is a key task for applications in sports analytics, robotics, and autonomous systems.<n>bfAdaSpot achieves state-of-the-art performance under strict evaluation metrics.
- Score: 59.31340724915079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise Event Spotting aims to localize fast-paced actions or events in videos with high temporal precision, a key task for applications in sports analytics, robotics, and autonomous systems. Existing methods typically process all frames uniformly, overlooking the inherent spatio-temporal redundancy in video data. This leads to redundant computation on non-informative regions while limiting overall efficiency. To remain tractable, they often spatially downsample inputs, losing fine-grained details crucial for precise localization. To address these limitations, we propose \textbf{AdaSpot}, a simple yet effective framework that processes low-resolution videos to extract global task-relevant features while adaptively selecting the most informative region-of-interest in each frame for high-resolution processing. The selection is performed via an unsupervised, task-aware strategy that maintains spatio-temporal consistency across frames and avoids the training instability of learnable alternatives. This design preserves essential fine-grained visual cues with a marginal computational overhead compared to low-resolution-only baselines, while remaining far more efficient than uniform high-resolution processing. Experiments on standard PES benchmarks demonstrate that \textbf{AdaSpot} achieves state-of-the-art performance under strict evaluation metrics (\eg, $+3.96$ and $+2.26$ mAP$@0$ frames on Tennis and FineDiving), while also maintaining strong results under looser metrics. Code is available at: \href{https://github.com/arturxe2/AdaSpot}{https://github.com/arturxe2/AdaSpot}.
Related papers
- CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation [6.478456907626643]
Cross-Layer Independent Deformable Description (CLIDD) is a method that achieves superior distinctiveness by sampling directly from independent feature hierarchies.<n>To ensure real-time performance, we implement a hardware-aware kernel fusion strategy.<n>We develop a scalable framework that integrates lightweight architectures with a training protocol.
arXiv Detail & Related papers (2026-01-14T07:03:01Z) - PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z) - RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization [50.75654397516163]
We propose RelayFormer, a unified framework that adapts to varying resolutions and modalities.<n> RelayFormer partitions inputs into fixed-size sub-images and introduces Global-Local Relay (GLR) tokens.<n>This enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts.
arXiv Detail & Related papers (2025-08-13T03:35:28Z) - AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity [9.63873831179673]
Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase.<n>We propose textbfAnchorAttention, a difference-aware, dynamic sparse attention mechanism that efficiently identifies critical attention regions.<n>With its finer-grained sparsity strategy, textbfAnchorAttention achieves higher sparsity rates at the same recall level, significantly reducing computation time.
arXiv Detail & Related papers (2025-05-29T14:59:06Z) - Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling [13.283434521851998]
Event cameras offer high temporal resolution and power efficiency, making them well-suited for edge AI applications.<n>Subsampling methods provide a practical solution, but their effect on downstream visual tasks remains underexplored.<n>We evaluate six hardware-friendly subsampling methods for event video classification on various benchmark datasets.
arXiv Detail & Related papers (2025-05-27T13:37:08Z) - Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach [32.91982063297922]
We propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack.<n>The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments.<n>Our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones.
arXiv Detail & Related papers (2025-05-19T09:37:23Z) - Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition [82.75714185083383]
This paper investigates the phenomenon of data redundancy in video understanding, with the aim to improve computational efficiency.<n>Motivated by this phenomenon, we introduce a spatially adaptive video recognition approach, termed AdaFocus.<n>Our resulting framework, Uni-AdaFocus, establishes a comprehensive framework that integrates seamlessly spatial, temporal, and sample-wise dynamic computation.
arXiv Detail & Related papers (2024-12-15T15:51:44Z) - Learning to Estimate Hidden Motions with Global Motion Aggregation [71.12650817490318]
Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences.
We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image.
We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
arXiv Detail & Related papers (2021-04-06T10:32:03Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.