C^3Net: End-to-End deep learning for efficient real-time visual active
camera control
- URL: http://arxiv.org/abs/2107.13233v1
- Date: Wed, 28 Jul 2021 09:31:46 GMT
- Title: C^3Net: End-to-End deep learning for efficient real-time visual active
camera control
- Authors: Christos Kyrkou
- Abstract summary: The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control.
In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement.
It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values.
- Score: 4.09920839425892
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The need for automated real-time visual systems in applications such as smart
camera surveillance, smart environments, and drones necessitates the
improvement of methods for visual active monitoring and control. Traditionally,
the active monitoring task has been handled through a pipeline of modules such
as detection, filtering, and control. However, such methods are difficult to
jointly optimize and tune their various parameters for real-time processing in
resource constraint systems. In this paper a deep Convolutional Camera
Controller Neural Network is proposed to go directly from visual information to
camera movement to provide an efficient solution to the active vision problem.
It is trained end-to-end without bounding box annotations to control a camera
and follow multiple targets from raw pixel values. Evaluation through both a
simulation framework and real experimental setup, indicate that the proposed
solution is robust to varying conditions and able to achieve better monitoring
performance than traditional approaches both in terms of number of targets
monitored as well as in effective monitoring time. The advantage of the
proposed approach is that it is computationally less demanding and can run at
over 10 FPS (~4x speedup) on an embedded smart camera providing a practical and
affordable solution to real-time active monitoring.
Related papers
- VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Argus++: Robust Real-time Activity Detection for Unconstrained Video
Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams.
The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z) - CANS: Communication Limited Camera Network Self-Configuration for
Intelligent Industrial Surveillance [8.360870648463653]
Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data.
Multiple video streams compete for limited communication resources on the link between edge devices and camera networks.
An adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service.
arXiv Detail & Related papers (2021-09-13T01:54:33Z) - Imitation-Based Active Camera Control with Deep Convolutional Neural
Network [4.09920839425892]
In this paper we frame active visual monitoring as an imitation learning problem to be solved in a supervised manner using deep learning.
A deep convolutional neural network is trained end-to-end as the camera controller that learns the entire processing pipeline needed to control a camera to follow multiple targets.
Experimental results indicate that the proposed solution is robust to varying conditions and is able to achieve better monitoring performance.
arXiv Detail & Related papers (2020-12-11T15:37:33Z) - Artificial Intelligence Enabled Traffic Monitoring System [3.085453921856008]
This article presents a novel approach to automatically monitor real time traffic footage using deep convolutional neural networks.
The proposed system deploys several state-of-the-art deep learning algorithms to automate different traffic monitoring needs.
arXiv Detail & Related papers (2020-10-02T22:28:02Z) - YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart
Camera Applications [2.588973722689844]
This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications.
A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion.
Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
arXiv Detail & Related papers (2020-07-27T09:50:11Z) - Neuromorphic Eye-in-Hand Visual Servoing [0.9949801888214528]
Event cameras give human-like vision capabilities with low latency and wide dynamic range.
We present a visual servoing method using an event camera and a switching control strategy to explore, reach and grasp.
Experiments prove the effectiveness of the method to track and grasp objects of different shapes without the need for re-tuning.
arXiv Detail & Related papers (2020-04-15T23:57:54Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.