Imitation-Based Active Camera Control with Deep Convolutional Neural
Network
- URL: http://arxiv.org/abs/2012.06428v1
- Date: Fri, 11 Dec 2020 15:37:33 GMT
- Title: Imitation-Based Active Camera Control with Deep Convolutional Neural
Network
- Authors: Christos Kyrkou
- Abstract summary: In this paper we frame active visual monitoring as an imitation learning problem to be solved in a supervised manner using deep learning.
A deep convolutional neural network is trained end-to-end as the camera controller that learns the entire processing pipeline needed to control a camera to follow multiple targets.
Experimental results indicate that the proposed solution is robust to varying conditions and is able to achieve better monitoring performance.
- Score: 4.09920839425892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing need for automated visual monitoring and control for
applications such as smart camera surveillance, traffic monitoring, and
intelligent environments, necessitates the improvement of methods for visual
active monitoring. Traditionally, the active monitoring task has been handled
through a pipeline of modules such as detection, filtering, and control. In
this paper we frame active visual monitoring as an imitation learning problem
to be solved in a supervised manner using deep learning, to go directly from
visual information to camera movement in order to provide a satisfactory
solution by combining computer vision and control. A deep convolutional neural
network is trained end-to-end as the camera controller that learns the entire
processing pipeline needed to control a camera to follow multiple targets and
also estimate their density from a single image. Experimental results indicate
that the proposed solution is robust to varying conditions and is able to
achieve better monitoring performance both in terms of number of targets
monitored as well as in monitoring time than traditional approaches, while
reaching up to 25 FPS. Thus making it a practical and affordable solution for
multi-target active monitoring in surveillance and smart-environment
applications.
Related papers
- RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator [14.77553682217217]
We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent's visual data in a high dimensional latent space.
Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-09-04T22:14:59Z) - Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation [6.435984242701043]
Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences.
This innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs.
We present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets; and (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data.
arXiv Detail & Related papers (2024-06-09T20:52:47Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - C^3Net: End-to-End deep learning for efficient real-time visual active
camera control [4.09920839425892]
The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control.
In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement.
It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values.
arXiv Detail & Related papers (2021-07-28T09:31:46Z) - Scalable Perception-Action-Communication Loops with Convolutional and
Graph Neural Networks [208.15591625749272]
We present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI)
Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning.
We demonstrate that VGAI yields performance comparable to or better than other decentralized controllers.
arXiv Detail & Related papers (2021-06-24T23:57:21Z) - Artificial Intelligence Enabled Traffic Monitoring System [3.085453921856008]
This article presents a novel approach to automatically monitor real time traffic footage using deep convolutional neural networks.
The proposed system deploys several state-of-the-art deep learning algorithms to automate different traffic monitoring needs.
arXiv Detail & Related papers (2020-10-02T22:28:02Z) - Neuromorphic Eye-in-Hand Visual Servoing [0.9949801888214528]
Event cameras give human-like vision capabilities with low latency and wide dynamic range.
We present a visual servoing method using an event camera and a switching control strategy to explore, reach and grasp.
Experiments prove the effectiveness of the method to track and grasp objects of different shapes without the need for re-tuning.
arXiv Detail & Related papers (2020-04-15T23:57:54Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z) - Training-free Monocular 3D Event Detection System for Traffic
Surveillance [93.65240041833319]
Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available.
In real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible.
We propose a training-free monocular 3D event detection system for traffic surveillance.
arXiv Detail & Related papers (2020-02-01T04:42:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.