Robustness Certification of Visual Perception Models via Camera Motion
Smoothing
- URL: http://arxiv.org/abs/2210.04625v1
- Date: Tue, 4 Oct 2022 15:31:57 GMT
- Title: Robustness Certification of Visual Perception Models via Camera Motion
Smoothing
- Authors: Hanjiang Hu, Zuxin Liu, Linyi Li, Jiacheng Zhu, Ding Zhao
- Abstract summary: We study the robustness of the visual perception model under camera motion perturbations to investigate the influence of camera motion on robotic perception.
We propose a motion smoothing technique for arbitrary image classification models, whose robustness under camera motion perturbations could be certified.
We conduct extensive experiments to validate the certification approach via motion smoothing against camera motion perturbations.
- Score: 23.5329905995857
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A vast literature shows that the learning-based visual perception model is
sensitive to adversarial noises but few works consider the robustness of
robotic perception models under widely-existing camera motion perturbations. To
this end, we study the robustness of the visual perception model under camera
motion perturbations to investigate the influence of camera motion on robotic
perception. Specifically, we propose a motion smoothing technique for arbitrary
image classification models, whose robustness under camera motion perturbations
could be certified. The proposed robustness certification framework based on
camera motion smoothing provides tight and scalable robustness guarantees for
visual perception modules so that they are applicable to wide robotic
applications. As far as we are aware, this is the first work to provide the
robustness certification for the deep perception module against camera motions,
which improves the trustworthiness of robotic perception. A realistic indoor
robotic dataset with the dense point cloud map for the entire room, MetaRoom,
is introduced for the challenging certifiable robust perception task. We
conduct extensive experiments to validate the certification approach via motion
smoothing against camera motion perturbations. Our framework guarantees the
certified accuracy of 81.7% against camera translation perturbation along depth
direction within -0.1m ` 0.1m. We also validate the effectiveness of our method
on the real-world robot by conducting hardware experiment on the robotic arm
with an eye-in-hand camera. The code is available on
https://github.com/HanjiangHu/camera-motion-smoothing.
Related papers
- Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image [14.485182089870928]
We propose a novel framework that leverages motion blur as a rich cue for motion estimation.
Our approach works by predicting a dense motion flow field and a monocular depth map directly from a single motion-blurred image.
Our method produces an IMU-like measurement that robustly captures fast and aggressive camera movements.
arXiv Detail & Related papers (2025-03-21T17:58:56Z) - MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [104.1338295060383]
We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes.
Our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work.
arXiv Detail & Related papers (2024-12-05T18:59:42Z) - Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos [64.48857272250446]
We introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer.
We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge.
To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control.
arXiv Detail & Related papers (2024-12-05T18:57:04Z) - AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.
We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - Microsaccade-inspired Event Camera for Robotics [42.27082276343167]
We design an event-based perception system capable of simultaneously maintaining low reaction time and stable texture.
The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion.
Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.
arXiv Detail & Related papers (2024-05-28T02:49:46Z) - Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip
Perception of Mobile Manipulation Robots [22.63980025871784]
The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor.
The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status.
Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data.
arXiv Detail & Related papers (2024-03-06T09:15:53Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Pixel-wise Smoothing for Certified Robustness against Camera Motion
Perturbations [45.576866560987405]
We present a framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations.
Our approach leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space.
Our approach achieves approximately 80% certified accuracy while utilizing only 30% of the projected image frames.
arXiv Detail & Related papers (2023-09-22T19:15:49Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Performance Evaluation of Low-Cost Machine Vision Cameras for
Image-Based Grasp Verification [0.0]
In this paper, we propose a vision based grasp verification system using machine vision cameras.
Our experiments demonstrate that the selected machine vision camera and the deep learning models can robustly verify grasp with 97% per frame accuracy.
arXiv Detail & Related papers (2020-03-23T10:34:27Z) - GhostImage: Remote Perception Attacks against Camera-based Image
Classification Systems [6.637193297008101]
In vision-based object classification systems imaging sensors perceive the environment and machine learning is then used to detect and classify objects for decision-making purposes.
We demonstrate how the perception domain can be remotely and unobtrusively exploited to enable an attacker to create spurious objects or alter an existing object.
arXiv Detail & Related papers (2020-01-21T21:58:45Z) - Morphology-Agnostic Visual Robotic Control [76.44045983428701]
MAVRIC is an approach that works with minimal prior knowledge of the robot's morphology.
We demonstrate our method on visually-guided 3D point reaching, trajectory following, and robot-to-robot imitation.
arXiv Detail & Related papers (2019-12-31T15:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.