Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping
- URL: http://arxiv.org/abs/2601.11290v1
- Date: Fri, 16 Jan 2026 13:41:56 GMT
- Title: Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping
- Authors: Vishisht Sharma, Sam Leroux, Lisa Landuyt, Nick Witvrouwen, Pieter Simoens,
- Abstract summary: Temporal Token Reuse (TTR) is an adaptive inference framework capable of accelerating video segmentation on embedded devices.<n>We show that TTR achieves a 30% reduction in inference latency with negligible degradation in segmentation accuracy ( 0.5% mIoU)<n>These findings confirm that TTR effectively shifts the operational frontier, enabling high-fidelity, real-time oblique video understanding for time-critical remote sensing missions.
- Score: 7.460695517551536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective disaster response relies on rapid disaster response, where oblique aerial video is the primary modality for initial scouting due to its ability to maximize spatial coverage and situational awareness in limited flight time. However, the on-board processing of high-resolution oblique streams is severely bottlenecked by the strict Size, Weight, and Power (SWaP) constraints of Unmanned Aerial Vehicles (UAVs). The computational density required to process these wide-field-of-view streams precludes low-latency inference on standard edge hardware. To address this, we propose Temporal Token Reuse (TTR), an adaptive inference framework capable of accelerating video segmentation on embedded devices. TTR exploits the intrinsic spatiotemporal redundancy of aerial video by formulating image patches as tokens; it utilizes a lightweight similarity metric to dynamically identify static regions and propagate their precomputed deep features, thereby bypassing redundant backbone computations. We validate the framework on standard benchmarks and a newly curated Oblique Floodwater Dataset designed for hydrological monitoring. Experimental results on edge-grade hardware demonstrate that TTR achieves a 30% reduction in inference latency with negligible degradation in segmentation accuracy (< 0.5% mIoU). These findings confirm that TTR effectively shifts the operational Pareto frontier, enabling high-fidelity, real-time oblique video understanding for time-critical remote sensing missions
Related papers
- Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning [14.799267729619428]
In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update.<n>We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame.<n>We show that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer.
arXiv Detail & Related papers (2026-02-10T04:06:32Z) - Video Depth Propagation [54.523028170425256]
Existing methods rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies.<n>We propose VeloDepth, which effectively leverages an online video pipeline and performs deep feature propagation.<n>Our design structurally enforces temporal consistency, resulting in stable depth predictions across consecutive frames with improved efficiency.
arXiv Detail & Related papers (2025-12-11T15:08:37Z) - Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features [51.5076190312734]
Video Super-Resolution approaches suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity.<n>We propose a novelly Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR)<n>Experiments on synthetic and real-world datasets demonstrate that DGAF-VSR surpasses state-of-the-art methods in key aspects of VSR.
arXiv Detail & Related papers (2025-11-21T03:40:45Z) - Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar [14.023965177100239]
Real-time imaging sonar has become an important tool for underwater monitoring in environments where optical sensing is unreliable.<n>We present SCOPE, a self-supervised framework that jointly performs compression and artifact correction without clean-noise pairs or synthetic assumptions.<n>SCOPE has been deployed for months in three Pacific Northwest rivers to support real-time salmon enumeration and environmental monitoring in the wild.
arXiv Detail & Related papers (2025-11-17T21:19:15Z) - ResidualViT for Efficient Temporally Dense Video Encoding [66.57779133786131]
We make three contributions to reduce the cost of computing features for temporally dense tasks.<n>First, we introduce a vision transformer (ViT) architecture, dubbed ResidualViT, that leverages the large temporal redundancy in videos.<n>Second, we propose a lightweight distillation strategy to approximate the frame-level features of the original foundation model.
arXiv Detail & Related papers (2025-09-16T17:12:23Z) - Lightweight CNNs for Embedded SAR Ship Target Detection and Classification [0.0]
On-board processing to generate higher-level products could reduce the data volume that needs to be downlinked.<n>This work proposes and evaluates neural networks designed for real-time inference on unfocused SAR data.
arXiv Detail & Related papers (2025-08-14T14:55:19Z) - FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.