End-to-End Learning of Keypoint Representations for Continuous Control
from Images
- URL: http://arxiv.org/abs/2106.07995v1
- Date: Tue, 15 Jun 2021 09:17:06 GMT
- Title: End-to-End Learning of Keypoint Representations for Continuous Control
from Images
- Authors: Rinu Boney, Alexander Ilin, Juho Kannala
- Abstract summary: We show that it is possible to learn efficient keypoint representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses.
Our proposed architecture consists of a differentiable keypoint extractor that feeds the coordinates directly to a soft actor-critic agent.
- Score: 84.8536730437934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many control problems that include vision, optimal controls can be
inferred from the location of the objects in the scene. This information can be
represented using keypoints, which is a list of spatial locations in the input
image. Previous works show that keypoint representations learned during
unsupervised pre-training using encoder-decoder architectures can provide good
features for control tasks. In this paper, we show that it is possible to learn
efficient keypoint representations end-to-end, without the need for
unsupervised pre-training, decoders, or additional losses. Our proposed
architecture consists of a differentiable keypoint extractor that feeds the
coordinates of the estimated keypoints directly to a soft actor-critic agent.
The proposed algorithm yields performance competitive to the state-of-the art
on DeepMind Control Suite tasks.
Related papers
- Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Weakly Supervised Keypoint Discovery [27.750244813890262]
We propose a method for keypoint discovery from a 2D image using image-level supervision.
Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts.
Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios.
arXiv Detail & Related papers (2021-09-28T01:26:53Z) - Accurate Grid Keypoint Learning for Efficient Video Prediction [87.71109421608232]
Keypoint-based video prediction methods can consume substantial computing resources in training and deployment.
In this paper, we design a new grid keypoint learning framework, aiming at a robust and explainable intermediate keypoint representation for long-term efficient video prediction.
Our method outperforms the state-ofthe-art video prediction methods while saves 98% more than computing resources.
arXiv Detail & Related papers (2021-07-28T05:04:30Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - Semi-supervised Keypoint Localization [12.37129078618206]
We propose to learn simultaneously keypoint heatmaps and pose invariant keypoint representations in a semi-supervised manner.
Our approach significantly outperforms previous methods on several benchmarks for human and animal body landmark localization.
arXiv Detail & Related papers (2021-01-20T06:23:08Z) - Unsupervised Object Keypoint Learning using Local Spatial Predictability [10.862430265350804]
We propose PermaKey, a novel approach to representation learning based on object keypoints.
We demonstrate the efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors.
arXiv Detail & Related papers (2020-11-25T18:27:05Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via
Multi-View Consistency [11.357804868755155]
We advocate semantic 3D keypoints as a visual representation, and present a semi-supervised training objective.
Unlike local texture-based approaches, our model integrates contextual information from a large area.
We demonstrate that this ability to locate semantic keypoints enables high level scripting of human understandable behaviours.
arXiv Detail & Related papers (2020-09-30T14:44:54Z) - CoKe: Localized Contrastive Learning for Robust Keypoint Detection [24.167397429511915]
We show that keypoint kernels can be chosen to optimize three types of distances in the feature space.
We formulate this optimization process within a framework, which includes supervised contrastive learning.
CoKe achieves state-of-the-art results compared to approaches that jointly represent all keypoints holistically.
arXiv Detail & Related papers (2020-09-29T16:00:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.