Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control
- URL: http://arxiv.org/abs/2507.19878v1
- Date: Sat, 26 Jul 2025 09:17:38 GMT
- Title: Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control
- Authors: Sebastian Mocanu, Sebastian-Ion Nae, Mihai-Eugen Barbu, Marius Leordeanu,
- Abstract summary: This work introduces a self-supervised neuro-analytical, cost efficient, model for visual-based quadrotor control in which a small 1.7M parameters student ConvNet learns automatically from an analytical teacher.<n>Our vision-only self-supervised neuro-analytic control, enables quadrotor orientation and movement without requiring explicit geometric models or fiducial markers.
- Score: 7.791675745811072
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work introduces a self-supervised neuro-analytical, cost efficient, model for visual-based quadrotor control in which a small 1.7M parameters student ConvNet learns automatically from an analytical teacher, an improved image-based visual servoing (IBVS) controller. Our IBVS system solves numerical instabilities by reducing the classical visual servoing equations and enabling efficient stable image feature detection. Through knowledge distillation, the student model achieves 11x faster inference compared to the teacher IBVS pipeline, while demonstrating similar control accuracy at a significantly lower computational and memory cost. Our vision-only self-supervised neuro-analytic control, enables quadrotor orientation and movement without requiring explicit geometric models or fiducial markers. The proposed methodology leverages simulation-to-reality transfer learning and is validated on a small drone platform in GPS-denied indoor environments. Our key contributions include: (1) an analytical IBVS teacher that solves numerical instabilities inherent in classical approaches, (2) a two-stage segmentation pipeline combining YOLOv11 with a U-Net-based mask splitter for robust anterior-posterior vehicle segmentation to correctly estimate the orientation of the target, and (3) an efficient knowledge distillation dual-path system, which transfers geometric visual servoing capabilities from the analytical IBVS teacher to a compact and small student neural network that outperforms the teacher, while being suitable for real-time onboard deployment.
Related papers
- Advancing from Automated to Autonomous Beamline by Leveraging Computer Vision [16.747469612768917]
Current state-of-the-art synchrotron beamlines still heavily rely on human safety oversight.<n>A computer vision-based system is proposed, integrating deep learning and multiview cameras for real-time collision detection.<n> Experiments on a real beamline dataset demonstrate high accuracy, real-time performance, and strong potential for autonomous synchrotron beamline operations.
arXiv Detail & Related papers (2025-06-01T04:53:55Z) - Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles [0.0]
This paper introduces the Deep Learning-based Model Predictive Controller with Scene Dynamics (DL-NMPC-SD) method for autonomous navigation.<n>DL-NMPC-SD uses an a-priori nominal vehicle model in combination with a scene dynamics model learned from temporal range sensing information.
arXiv Detail & Related papers (2025-04-02T03:46:37Z) - LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction [12.071846486955627]
We introduce a novel occupancy-instance modeling framework for class-agnostic motion prediction tasks, named LEGO-Motion.<n>Our model comprises (1) a BEV encoder, (2) an Interaction-Augmented Instance, and (3) an Instance-Enhanced BEV.<n>Our method achieves state-of-the-art performance, outperforming existing approaches.
arXiv Detail & Related papers (2025-03-10T14:26:21Z) - An Interpretable Neural Control Network with Adaptable Online Learning for Sample Efficient Robot Locomotion Learning [7.6119527195998]
Sequential Motion Executor (SME) is a three-layer interpretable neural network.<n> Adaptable Gradient-weighting Online Learning (AGOL) algorithm prioritizes the update of the parameters with high relevance score.<n> SME-AGOL requires 40% fewer samples and receives 150% higher final reward/locomotion performance on a simulated hexapod robot.
arXiv Detail & Related papers (2025-01-18T08:37:33Z) - TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies [95.30717188630432]
We introduce visual trace prompting to facilitate VLA models' spatial-temporal awareness for action prediction.<n>We develop a new TraceVLA model by finetuning OpenVLA on our own collected dataset of 150K robot manipulation trajectories.<n>We present a compact VLA model based on 4B Phi-3-Vision, pretrained on the Open-X-Embodiment and finetuned on our dataset.
arXiv Detail & Related papers (2024-12-13T18:40:51Z) - CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Model-free tracking control of complex dynamical trajectories with
machine learning [0.2356141385409842]
We develop a model-free, machine-learning framework to control a two-arm robotic manipulator.
We demonstrate the effectiveness of the control framework using a variety of periodic and chaotic signals.
arXiv Detail & Related papers (2023-09-20T17:10:10Z) - A Light-weight Deep Learning Model for Remote Sensing Image
Classification [70.66164876551674]
We present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC)
By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems.
arXiv Detail & Related papers (2023-02-25T09:02:01Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate
Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation.
We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience.
Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.