Unsupervised Object Keypoint Learning using Local Spatial Predictability
- URL: http://arxiv.org/abs/2011.12930v2
- Date: Mon, 8 Mar 2021 15:10:29 GMT
- Title: Unsupervised Object Keypoint Learning using Local Spatial Predictability
- Authors: Anand Gopalakrishnan, Sjoerd van Steenkiste, J\"urgen Schmidhuber
- Abstract summary: We propose PermaKey, a novel approach to representation learning based on object keypoints.
We demonstrate the efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors.
- Score: 10.862430265350804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose PermaKey, a novel approach to representation learning based on
object keypoints. It leverages the predictability of local image regions from
spatial neighborhoods to identify salient regions that correspond to object
parts, which are then converted to keypoints. Unlike prior approaches, it
utilizes predictability to discover object keypoints, an intrinsic property of
objects. This ensures that it does not overly bias keypoints to focus on
characteristics that are not unique to objects, such as movement, shape, colour
etc. We demonstrate the efficacy of PermaKey on Atari where it learns keypoints
corresponding to the most salient object parts and is robust to certain visual
distractors. Further, on downstream RL tasks in the Atari domain we demonstrate
how agents equipped with our keypoints outperform those using competing
alternatives, even on challenging environments with moving backgrounds or
distractor objects.
Related papers
- Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics.
Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features.
We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z) - Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation [56.07676459156789]
Localizing 3D keypoints in a 2D image is an effective way to establish 3D-2D correspondences for 6DoF object pose estimation.
In this paper, we address this issue by localizing the important keypoints in terms of visibility.
We construct VAPO (Visibility-Aware POse estimator) by integrating the visibility-aware importance with a state-of-the-art pose estimation algorithm.
arXiv Detail & Related papers (2024-03-21T16:59:45Z) - OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments [20.034972354302788]
We extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari.
Our framework allows for object discovery, object representation learning, as well as object-centric RL.
arXiv Detail & Related papers (2023-06-14T17:28:46Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings [17.04471874483516]
Existing approaches either compute dense keypoint embeddings in a single forward pass, or allocate their full capacity to a sparse set of points.
In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few.
Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding.
arXiv Detail & Related papers (2021-12-09T13:25:42Z) - Weakly Supervised Keypoint Discovery [27.750244813890262]
We propose a method for keypoint discovery from a 2D image using image-level supervision.
Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts.
Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios.
arXiv Detail & Related papers (2021-09-28T01:26:53Z) - End-to-End Learning of Keypoint Representations for Continuous Control
from Images [84.8536730437934]
We show that it is possible to learn efficient keypoint representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses.
Our proposed architecture consists of a differentiable keypoint extractor that feeds the coordinates directly to a soft actor-critic agent.
arXiv Detail & Related papers (2021-06-15T09:17:06Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Semi-supervised Keypoint Localization [12.37129078618206]
We propose to learn simultaneously keypoint heatmaps and pose invariant keypoint representations in a semi-supervised manner.
Our approach significantly outperforms previous methods on several benchmarks for human and animal body landmark localization.
arXiv Detail & Related papers (2021-01-20T06:23:08Z) - UKPGAN: A General Self-Supervised Keypoint Detector [43.35270822722044]
UKPGAN is a general self-supervised 3D keypoint detector.
Our keypoints align well with human annotated keypoint labels.
Our model is stable under both rigid and non-rigid transformations.
arXiv Detail & Related papers (2020-11-24T09:08:21Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.