Learning to Explore Informative Trajectories and Samples for Embodied
Perception
- URL: http://arxiv.org/abs/2303.10936v1
- Date: Mon, 20 Mar 2023 08:20:04 GMT
- Title: Learning to Explore Informative Trajectories and Samples for Embodied
Perception
- Authors: Ya Jing, Tao Kong
- Abstract summary: Generalizing perception models to unseen embodied tasks is insufficiently studied.
We build a 3D semantic distribution map to train the exploration policy self-supervised.
With the explored informative trajectories, we propose to select hard samples on trajectories based on the semantic distribution uncertainty.
Experiments show that the perception model fine-tuned with our method outperforms the baselines trained with other exploration policies.
- Score: 24.006056116516618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We are witnessing significant progress on perception models, specifically
those trained on large-scale internet images. However, efficiently generalizing
these perception models to unseen embodied tasks is insufficiently studied,
which will help various relevant applications (e.g., home robots). Unlike
static perception methods trained on pre-collected images, the embodied agent
can move around in the environment and obtain images of objects from any
viewpoints. Therefore, efficiently learning the exploration policy and
collection method to gather informative training samples is the key to this
task. To do this, we first build a 3D semantic distribution map to train the
exploration policy self-supervised by introducing the semantic distribution
disagreement and the semantic distribution uncertainty rewards. Note that the
map is generated from multi-view observations and can weaken the impact of
misidentification from an unfamiliar viewpoint. Our agent is then encouraged to
explore the objects with different semantic distributions across viewpoints, or
uncertain semantic distributions. With the explored informative trajectories,
we propose to select hard samples on trajectories based on the semantic
distribution uncertainty to reduce unnecessary observations that can be
correctly identified. Experiments show that the perception model fine-tuned
with our method outperforms the baselines trained with other exploration
policies. Further, we demonstrate the robustness of our method in real-robot
experiments.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Self-Supervised Representation Learning for Adversarial Attack Detection [6.528181610035978]
Supervised learning-based adversarial attack detection methods rely on a large number of labeled data.
We propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback.
arXiv Detail & Related papers (2024-07-05T09:37:16Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
Representations [2.363388546004777]
We propose a novel auxiliary pretraining method that is based on spatial reasoning.
Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods.
arXiv Detail & Related papers (2023-05-21T07:46:46Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration [47.01485765231528]
Active visual exploration aims to assist an agent with a limited field of view to understand its environment based on partial observations.
We propose the Glimpse-Attend-and-Explore model which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps.
Our model provides encouraging results while being less dependent on dataset bias in driving the exploration.
arXiv Detail & Related papers (2021-08-26T11:41:03Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.