Neural Camera Models
- URL: http://arxiv.org/abs/2208.12903v1
- Date: Sat, 27 Aug 2022 01:28:46 GMT
- Title: Neural Camera Models
- Authors: Igor Vasiljevic
- Abstract summary: Machine-learning-aided depth estimation, or depth estimation, predicts for each pixel in an image the distance to the imaged scene point.
In this thesis, we focus on relaxing these assumptions, and describe contributions toward the ultimate goal of turning cameras into truly generic depth sensors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern computer vision has moved beyond the domain of internet photo
collections and into the physical world, guiding camera-equipped robots and
autonomous cars through unstructured environments. To enable these embodied
agents to interact with real-world objects, cameras are increasingly being used
as depth sensors, reconstructing the environment for a variety of downstream
reasoning tasks. Machine-learning-aided depth perception, or depth estimation,
predicts for each pixel in an image the distance to the imaged scene point.
While impressive strides have been made in depth estimation, significant
challenges remain: (1) ground truth depth labels are difficult and expensive to
collect at scale, (2) camera information is typically assumed to be known, but
is often unreliable and (3) restrictive camera assumptions are common, even
though a great variety of camera types and lenses are used in practice. In this
thesis, we focus on relaxing these assumptions, and describe contributions
toward the ultimate goal of turning cameras into truly generic depth sensors.
Related papers
- Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation [53.27191803311681]
We rigorously analyze the properties of wrist-mounted fisheye cameras for imitation learning.<n>Fisheye-trained policies unlock superior scene generalization when trained with sufficient environmental diversity.<n>Our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning.
arXiv Detail & Related papers (2026-03-02T18:00:37Z) - Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots [55.43376513158555]
Camera Depth Models (CDMs) are a simple plugin on daily-use depth cameras.<n>We develop a neural data engine that generates high-quality paired data from simulation by modeling a depth camera's noise pattern.<n>For the first time, our experiments demonstrate, for the first time, that a policy trained on raw simulated depth, without the need for adding noise or real-world fine-tuning, generalizes seamlessly to real-world robots.
arXiv Detail & Related papers (2025-09-02T17:29:38Z) - 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D [39.58317527488534]
3D sensors, also known as RGB-D sensors, utilize depth images where each pixel measures the distance from the camera to objects.
The described prototype uses RGB-D sensors for bidirectional people counting in venues, aiding security and surveillance in spaces like stadiums or airports.
The system includes a RealSense D415 depth camera and a mini-computer running object detection algorithms to count people and a 2D camera for identity verification.
arXiv Detail & Related papers (2024-03-18T23:18:40Z) - Applications of Deep Learning for Top-View Omnidirectional Imaging: A
Survey [2.1485350418225244]
A large field-of-view fisheye camera allows for capturing a large area with minimal numbers of cameras when they are mounted on a high position facing downwards.
This top-view omnidirectional setup greatly reduces the work and cost for deployment compared to traditional solutions with multiple perspective cameras.
Deep learning has been widely employed for vision related tasks, including for such omnidirectional settings.
arXiv Detail & Related papers (2023-04-17T12:06:41Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - Learning Active Camera for Multi-Object Navigation [94.89618442412247]
Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications.
Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras.
In this paper, we consider navigating to multiple objects more efficiently with active cameras.
arXiv Detail & Related papers (2022-10-14T04:17:30Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Full Surround Monodepth from Multiple Cameras [31.145598985137468]
We extend self-supervised monocular depth and ego-motion estimation to large photo-baseline multi-camera rigs.
We learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner.
arXiv Detail & Related papers (2021-03-31T22:52:04Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.