Surround-View Cameras based Holistic Visual Perception for Automated
Driving
- URL: http://arxiv.org/abs/2206.05542v1
- Date: Sat, 11 Jun 2022 14:51:30 GMT
- Title: Surround-View Cameras based Holistic Visual Perception for Automated
Driving
- Authors: Varun Ravi Kumar
- Abstract summary: We focus on developing near-field perception algorithms with high performance and low computational complexity.
These capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying.
- Score: 0.6091702876917281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The formation of eyes led to the big bang of evolution. The dynamics changed
from a primitive organism waiting for the food to come into contact for eating
food being sought after by visual sensors. The human eye is one of the most
sophisticated developments of evolution, but it still has defects. Humans have
evolved a biological perception algorithm capable of driving cars, operating
machinery, piloting aircraft, and navigating ships over millions of years.
Automating these capabilities for computers is critical for various
applications, including self-driving cars, augmented reality, and architectural
surveying. Near-field visual perception in the context of self-driving cars can
perceive the environment in a range of $0-10$ meters and 360{\deg} coverage
around the vehicle. It is a critical decision-making component in the
development of safer automated driving. Recent advances in computer vision and
deep learning, in conjunction with high-quality sensors such as cameras and
LiDARs, have fueled mature visual perception solutions. Until now, far-field
perception has been the primary focus. Another significant issue is the limited
processing power available for developing real-time applications. Because of
this bottleneck, there is frequently a trade-off between performance and
run-time efficiency. We concentrate on the following issues in order to address
them: 1) Developing near-field perception algorithms with high performance and
low computational complexity for various visual perception tasks such as
geometric and semantic tasks using convolutional neural networks. 2) Using
Multi-Task Learning to overcome computational bottlenecks by sharing initial
convolutional layers between tasks and developing optimization strategies that
balance tasks.
Related papers
- Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference [43.474068248379815]
We propose a shared encoder trained on multiple computer vision tasks critical for urban navigation.
We introduce a multi-scale feature network for pose estimation to improve depth learning.
Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities.
arXiv Detail & Related papers (2024-09-16T08:54:03Z) - Improving automatic detection of driver fatigue and distraction using
machine learning [0.0]
Driver fatigue and distracted driving are important factors in traffic accidents.
We present techniques for simultaneously detecting fatigue and distracted driving behaviors using vision-based and machine learning-based approaches.
arXiv Detail & Related papers (2024-01-04T06:33:46Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - An Embarrassingly Pragmatic Introduction to Vision-based Autonomous
Robots [0.0]
We develop a small-scale autonomous vehicle capable of understanding the scene using only visual information.
We discuss the current state of Robotics and autonomous driving and the technological and ethical limitations that we can find in this field.
arXiv Detail & Related papers (2021-11-15T01:31:28Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Computer Stereo Vision for Autonomous Driving [31.517828028200682]
Computer stereo vision has been prevalently applied in autonomous cars for depth perception.
In this chapter, we introduce both the hardware and software aspects of computer stereo vision for autonomous car systems.
arXiv Detail & Related papers (2020-12-06T06:54:03Z) - Task-relevant Representation Learning for Networked Robotic Perception [74.0215744125845]
This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective.
Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods.
arXiv Detail & Related papers (2020-11-06T07:39:08Z) - End-to-end Autonomous Driving Perception with Sequential Latent
Representation Learning [34.61415516112297]
An end-to-end approach might clean up the system and avoid huge efforts of human engineering.
A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning.
The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts.
arXiv Detail & Related papers (2020-03-21T05:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.