Related papers: Learning on the Fly: Replay-Based Continual Object Perception for Indoor Drones

Learning on the Fly: Replay-Based Continual Object Perception for Indoor Drones

URL: http://arxiv.org/abs/2602.13440v1
Date: Fri, 13 Feb 2026 20:34:01 GMT
Title: Learning on the Fly: Replay-Based Continual Object Perception for Indoor Drones
Authors: Sebastian-Ion Nae, Mihai-Eugen Barbu, Sebastian Mocanu, Marius Leordeanu,
Abstract summary: We benchmark 3 replay-based CIL strategies: Experience Replay (ER), Maximally Interfered Retrieval (MIR), and Forgetting-Aware Replay (FAR)<n>The experiments further demonstrate that replay-based continual learning can be effectively applied to edge aerial systems.
Score: 4.473167683810348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous agents such as indoor drones must learn new object classes in real-time while limiting catastrophic forgetting, motivating Class-Incremental Learning (CIL). However, most unmanned aerial vehicle (UAV) datasets focus on outdoor scenes and offer limited temporally coherent indoor videos. We introduce an indoor dataset of $14,400$ frames capturing inter-drone and ground vehicle footage, annotated via a semi-automatic workflow with a $98.6\%$ first-pass labeling agreement before final manual verification. Using this dataset, we benchmark 3 replay-based CIL strategies: Experience Replay (ER), Maximally Interfered Retrieval (MIR), and Forgetting-Aware Replay (FAR), using YOLOv11-nano as a resource-efficient detector for deployment-constrained UAV platforms. Under tight memory budgets ($5-10\%$ replay), FAR performs better than the rest, achieving an average accuracy (ACC, $mAP_{50-95}$ across increments) of $82.96\%$ with $5\%$ replay. Gradient-weighted class activation mapping (Grad-CAM) analysis shows attention shifts across classes in mixed scenes, which is associated with reduced localization quality for drones. The experiments further demonstrate that replay-based continual learning can be effectively applied to edge aerial systems. Overall, this work contributes an indoor UAV video dataset with preserved temporal coherence and an evaluation of replay-based CIL under limited replay budgets. Project page: https://spacetime-vision-robotics-laboratory.github.io/learning-on-the-fly-cl

Related papers

KV-Tracker: Real-Time Pose Tracking with Transformers [30.32327636560028]
Multi-view 3D geometry networks offer a powerful prior but are prohibitively slow for real-time applications.<n>We propose a novel way to adapt them for online use, enabling real-time 6-DoF pose tracking and online reconstruction of objects and scenes from monocular RGB videos.
arXiv Detail & Related papers (2025-12-27T13:02:30Z)
Open-World Drone Active Tracking with Goal-Centered Rewards [62.21394499788672]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining [65.9024395309316]
We introduce a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs) We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance.
arXiv Detail & Related papers (2024-09-26T21:15:22Z)
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z)
Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint. Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance. We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z)
MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition [59.905048445296906]
We present a novel approach for action recognition in UAV videos. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
arXiv Detail & Related papers (2023-03-05T04:05:17Z)
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z)
Aerial View Goal Localization with Reinforcement Learning [6.165163123577484]
We present a framework that emulates a search-and-rescue (SAR)-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. We propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals)
arXiv Detail & Related papers (2022-09-08T10:27:53Z)
Motion Planning by Reinforcement Learning for an Unmanned Aerial Vehicle in Virtual Open Space with Static Obstacles [3.5356468463540214]
We applied reinforcement learning to perform motion planning for an unmanned aerial vehicle (UAV) in an open space with static obstacles. As the reinforcement learning progressed, the mean reward and goal rate of the model were increased.
arXiv Detail & Related papers (2020-09-24T16:42:56Z)
Spatiotemporal Contrastive Video Representation Learning [87.56145031149869]
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn visual representations from unlabeled videos. Our representations are learned using a contrasttemporalive loss, where two augmented clips from the same short video are pulled together in the embedding space. We study what makes for good data augmentations for video self-supervised learning and find that both spatial and temporal information are crucial.
arXiv Detail & Related papers (2020-08-09T19:58:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.