Related papers: C^3Net: End-to-End deep learning for efficient real-time visual active camera control

C^3Net: End-to-End deep learning for efficient real-time visual active camera control

URL: http://arxiv.org/abs/2107.13233v1
Date: Wed, 28 Jul 2021 09:31:46 GMT
Title: C^3Net: End-to-End deep learning for efficient real-time visual active camera control
Authors: Christos Kyrkou
Abstract summary: The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control. In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement. It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values.
Score: 4.09920839425892
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control. Traditionally, the active monitoring task has been handled through a pipeline of modules such as detection, filtering, and control. However, such methods are difficult to jointly optimize and tune their various parameters for real-time processing in resource constraint systems. In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement to provide an efficient solution to the active vision problem. It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values. Evaluation through both a simulation framework and real experimental setup, indicate that the proposed solution is robust to varying conditions and able to achieve better monitoring performance than traditional approaches both in terms of number of targets monitored as well as in effective monitoring time. The advantage of the proposed approach is that it is computationally less demanding and can run at over 10 FPS (~4x speedup) on an embedded smart camera providing a practical and affordable solution to real-time active monitoring.

Related papers

Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments [0.8796261172196743]
Vision-based target tracking is crucial for unmanned surface vehicles. Real-time tracking in maritime environments is challenging due to dynamic camera movement, low visibility, and scale variation. This study proposes a vision-guided object-tracking framework for USVs.
arXiv Detail & Related papers (2024-12-10T10:35:17Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection. To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z)
A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations. We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT. We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques. We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z)
Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control. This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z)
Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams. Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z)
Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework. It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z)
Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams. The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z)
CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance [8.360870648463653]
Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data. Multiple video streams compete for limited communication resources on the link between edge devices and camera networks. An adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service.
arXiv Detail & Related papers (2021-09-13T01:54:33Z)
Imitation-Based Active Camera Control with Deep Convolutional Neural Network [4.09920839425892]
In this paper we frame active visual monitoring as an imitation learning problem to be solved in a supervised manner using deep learning. A deep convolutional neural network is trained end-to-end as the camera controller that learns the entire processing pipeline needed to control a camera to follow multiple targets. Experimental results indicate that the proposed solution is robust to varying conditions and is able to achieve better monitoring performance.
arXiv Detail & Related papers (2020-12-11T15:37:33Z)
Artificial Intelligence Enabled Traffic Monitoring System [3.085453921856008]
This article presents a novel approach to automatically monitor real time traffic footage using deep convolutional neural networks. The proposed system deploys several state-of-the-art deep learning algorithms to automate different traffic monitoring needs.
arXiv Detail & Related papers (2020-10-02T22:28:02Z)
YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart Camera Applications [2.588973722689844]
This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications. A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion. Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
arXiv Detail & Related papers (2020-07-27T09:50:11Z)
Neuromorphic Eye-in-Hand Visual Servoing [0.9949801888214528]
Event cameras give human-like vision capabilities with low latency and wide dynamic range. We present a visual servoing method using an event camera and a switching control strategy to explore, reach and grasp. Experiments prove the effectiveness of the method to track and grasp objects of different shapes without the need for re-tuning.
arXiv Detail & Related papers (2020-04-15T23:57:54Z)
Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion. We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.