Related papers: Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras

Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras

URL: http://arxiv.org/abs/2304.04356v1
Date: Mon, 10 Apr 2023 02:41:56 GMT
Title: Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras
Authors: Sandeep Singh Sandha, Bharathan Balaji, Luis Garcia, Mani Srivastava
Abstract summary: We present an end-to-end deep reinforcement learning (RL) solution for autonomous control of pan-tiltzoom (PTZ) cameras. We introduce superior camera control by maintaining the object interest close to the center of captured images at resolution and has up 17% more tracking duration than the state-of-the-art.
Score: 4.8020206717026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios.

Related papers

A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations. We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT. We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning [10.886819238167286]
This study employs a deep reinforcement learning framework to train agents for exposure control. A lightweight image simulator is developed to facilitate the training process. Different levels of reward functions are crafted to enhance the VO systems.
arXiv Detail & Related papers (2024-08-30T04:37:52Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents. We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z)
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion [83.88829943619656]
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups. We propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios.
arXiv Detail & Related papers (2023-04-04T15:46:42Z)
A Flexible Framework for Virtual Omnidirectional Vision to Improve Operator Situation Awareness [2.817412580574242]
We present a flexible framework for virtual projections to increase situation awareness based on a novel method to fuse multiple cameras mounted anywhere on the robot. We propose a complementary approach to improve scene understanding by fusing camera images and geometric 3D Lidar data to obtain a colorized point cloud.
arXiv Detail & Related papers (2023-02-01T10:40:05Z)
Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control. This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z)
Learning Perception-Aware Agile Flight in Cluttered Environments [38.59659342532348]
We propose a method to learn neural network policies that achieve perception-aware, minimum-time flight in cluttered environments. Our approach tightly couples perception and control, showing a significant advantage in computation speed (10x faster) and success rate. We demonstrate the closed-loop control performance using a physical quadrotor and hardware-in-the-loop simulation at speeds up to 50km/h.
arXiv Detail & Related papers (2022-10-04T18:18:58Z)
C^3Net: End-to-End deep learning for efficient real-time visual active camera control [4.09920839425892]
The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control. In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement. It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values.
arXiv Detail & Related papers (2021-07-28T09:31:46Z)
Learning a State Representation and Navigation in Cluttered and Dynamic Environments [6.909283975004628]
We present a learning-based pipeline to realise local navigation with a quadrupedal robot in cluttered environments. The robot is able to safely locomote to a target location based on frames from a depth camera without any explicit mapping of the environment. We show that our system can handle noisy depth images, avoid dynamic obstacles unseen during training, and is endowed with local spatial awareness.
arXiv Detail & Related papers (2021-03-07T13:19:06Z)
Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.