Unadversarial Examples: Designing Objects for Robust Vision
- URL: http://arxiv.org/abs/2012.12235v1
- Date: Tue, 22 Dec 2020 18:26:07 GMT
- Title: Unadversarial Examples: Designing Objects for Robust Vision
- Authors: Hadi Salman, Andrew Ilyas, Logan Engstrom, Sai Vemprala, Aleksander
Madry, Ashish Kapoor
- Abstract summary: We develop a framework that exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design "robust objects"
We demonstrate the efficacy of the framework on a wide variety of vision-based tasks ranging from standard benchmarks to (in-simulation) robotics.
- Score: 100.4627585672469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a class of realistic computer vision settings wherein one can
influence the design of the objects being recognized. We develop a framework
that leverages this capability to significantly improve vision models'
performance and robustness. This framework exploits the sensitivity of modern
machine learning algorithms to input perturbations in order to design "robust
objects," i.e., objects that are explicitly optimized to be confidently
detected or classified. We demonstrate the efficacy of the framework on a wide
variety of vision-based tasks ranging from standard benchmarks, to
(in-simulation) robotics, to real-world experiments. Our code can be found at
https://git.io/unadversarial .
Related papers
- Spatially Visual Perception for End-to-End Robotic Learning [33.490603706207075]
We introduce a video-based spatial perception framework that leverages 3D spatial representations to address environmental variability.
Our approach integrates a novel image augmentation technique, AugBlender, with a state-of-the-art monocular depth estimation model trained on internet-scale data.
arXiv Detail & Related papers (2024-11-26T14:23:42Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Evaluating Robustness of Visual Representations for Object Assembly Task
Requiring Spatio-Geometrical Reasoning [8.626019848533707]
This paper focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks.
We employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders.
Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations.
arXiv Detail & Related papers (2023-10-15T20:41:07Z) - Tuning computer vision models with task rewards [88.45787930908102]
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.
In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward.
We adopt this approach and show its surprising effectiveness across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning.
arXiv Detail & Related papers (2023-02-16T11:49:48Z) - Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance.
We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation.
We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - Maintaining a Reliable World Model using Action-aware Perceptual
Anchoring [4.971403153199917]
There is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible.
This requires anchoring perceptual information onto symbols that represent the objects in the environment.
We present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner.
arXiv Detail & Related papers (2021-07-07T06:35:14Z) - Multi-Modal Learning of Keypoint Predictive Models for Visual Object
Manipulation [6.853826783413853]
Humans have impressive generalization capabilities when it comes to manipulating objects in novel environments.
How to learn such body schemas for robots remains an open problem.
We develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations.
arXiv Detail & Related papers (2020-11-08T01:04:59Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.