Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation
- URL: http://arxiv.org/abs/2603.02139v1
- Date: Mon, 02 Mar 2026 18:00:37 GMT
- Title: Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation
- Authors: Han Xue, Nan Min, Xiaotong Liu, Wendi Chen, Yuan Fang, Jun Lv, Cewu Lu, Chuan Wen,
- Abstract summary: We rigorously analyze the properties of wrist-mounted fisheye cameras for imitation learning.<n>Fisheye-trained policies unlock superior scene generalization when trained with sufficient environmental diversity.<n>Our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning.
- Score: 53.27191803311681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adoption of fisheye cameras in robotic manipulation, driven by their exceptionally wide Field of View (FoV), is rapidly outpacing a systematic understanding of their downstream effects on policy learning. This paper presents the first comprehensive empirical study to bridge this gap, rigorously analyzing the properties of wrist-mounted fisheye cameras for imitation learning. Through extensive experiments in both simulation and the real world, we investigate three critical research questions: spatial localization, scene generalization, and hardware generalization. Our investigation reveals that: (1) The wide FoV significantly enhances spatial localization, but this benefit is critically contingent on the visual complexity of the environment. (2) Fisheye-trained policies, while prone to overfitting in simple scenes, unlock superior scene generalization when trained with sufficient environmental diversity. (3) While naive cross-camera transfer leads to failures, we identify the root cause as scale overfitting and demonstrate that hardware generalization performance can be improved with a simple Random Scale Augmentation (RSA) strategy. Collectively, our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning. More results and videos are available on https://robo-fisheye.github.io/
Related papers
- R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation [74.41728218960465]
We propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data.<n>R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
arXiv Detail & Related papers (2025-10-09T17:55:44Z) - Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots [55.43376513158555]
Camera Depth Models (CDMs) are a simple plugin on daily-use depth cameras.<n>We develop a neural data engine that generates high-quality paired data from simulation by modeling a depth camera's noise pattern.<n>For the first time, our experiments demonstrate, for the first time, that a policy trained on raw simulated depth, without the need for adding noise or real-world fine-tuning, generalizes seamlessly to real-world robots.
arXiv Detail & Related papers (2025-09-02T17:29:38Z) - FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.<n>We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.<n>We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - The Treachery of Images: Bayesian Scene Keypoints for Deep Policy
Learning in Robotic Manipulation [28.30126109684119]
We present BASK, a Bayesian approach to tracking scale-invariant keypoints over time.
We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations.
arXiv Detail & Related papers (2023-05-08T14:05:38Z) - Visual-Policy Learning through Multi-Camera View to Single-Camera View
Knowledge Distillation for Robot Manipulation Tasks [4.820787231200527]
We present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks.
Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained teacher'' policy trained with multiple camera viewpoints guides a student'' policy in learning from a single camera viewpoint.
The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone.
arXiv Detail & Related papers (2023-03-13T11:42:38Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - Learning Active Camera for Multi-Object Navigation [94.89618442412247]
Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications.
Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras.
In this paper, we consider navigating to multiple objects more efficiently with active cameras.
arXiv Detail & Related papers (2022-10-14T04:17:30Z) - Neural Camera Models [0.0]
Machine-learning-aided depth estimation, or depth estimation, predicts for each pixel in an image the distance to the imaged scene point.
In this thesis, we focus on relaxing these assumptions, and describe contributions toward the ultimate goal of turning cameras into truly generic depth sensors.
arXiv Detail & Related papers (2022-08-27T01:28:46Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.