Invariance is Key to Generalization: Examining the Role of
Representation in Sim-to-Real Transfer for Visual Navigation
- URL: http://arxiv.org/abs/2310.15020v2
- Date: Mon, 4 Dec 2023 03:48:06 GMT
- Title: Invariance is Key to Generalization: Examining the Role of
Representation in Sim-to-Real Transfer for Visual Navigation
- Authors: Bo Ai, Zhanxin Wu, David Hsu
- Abstract summary: Key to generalization is representations that are rich enough to capture all task-relevant information.
We experimentally study such a representation for visual navigation.
We show that our representation reduces the A-distance between the training and test domains.
- Score: 35.01394611106655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The data-driven approach to robot control has been gathering pace rapidly,
yet generalization to unseen task domains remains a critical challenge. We
argue that the key to generalization is representations that are (i) rich
enough to capture all task-relevant information and (ii) invariant to
superfluous variability between the training and the test domains. We
experimentally study such a representation -- containing both depth and
semantic information -- for visual navigation and show that it enables a
control policy trained entirely in simulated indoor scenes to generalize to
diverse real-world environments, both indoors and outdoors. Further, we show
that our representation reduces the A-distance between the training and test
domains, improving the generalization error bound as a result. Our proposed
approach is scalable: the learned policy improves continuously, as the
foundation models that it exploits absorb more diverse data during
pre-training.
Related papers
- Learning Fair Invariant Representations under Covariate and Correlation Shifts Simultaneously [10.450977234741524]
We introduce a novel approach that focuses on learning a fairness-aware domain-invariant predictor.
Our approach surpasses state-of-the-art methods with respect to model accuracy as well as both group and individual fairness.
arXiv Detail & Related papers (2024-08-18T00:01:04Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Generalizable Imitation Learning Through Pre-Trained Representations [19.98418419179064]
We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations.
Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types.
arXiv Detail & Related papers (2023-11-15T20:15:51Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Generalization Across Observation Shifts in Reinforcement Learning [13.136140831757189]
We extend the bisimulation framework to account for context dependent observation shifts.
Specifically, we focus on the simulator based learning setting and use alternate observations to learn a representation space.
This allows us to deploy the agent to varying observation settings during test time and generalize to unseen scenarios.
arXiv Detail & Related papers (2023-06-07T16:49:03Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Learning with Style: Continual Semantic Segmentation Across Tasks and
Domains [25.137859989323537]
Domain adaptation and class incremental learning deal with domain and task variability separately, whereas their unified solution is still an open problem.
We tackle both facets of the problem together, taking into account the semantic shift within both input and label spaces.
We show how the proposed method outperforms existing approaches, which prove to be ill-equipped to deal with continual semantic segmentation under both task and domain shift.
arXiv Detail & Related papers (2022-10-13T13:24:34Z) - Temporal Disentanglement of Representations for Improved Generalisation
in Reinforcement Learning [7.972204774778987]
In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training.
We introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations.
We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods.
arXiv Detail & Related papers (2022-07-12T11:46:49Z) - Fishr: Invariant Gradient Variances for Out-of-distribution
Generalization [98.40583494166314]
Fishr is a learning scheme to enforce domain invariance in the space of the gradients of the loss function.
Fishr exhibits close relations with the Fisher Information and the Hessian of the loss.
In particular, Fishr improves the state of the art on the DomainBed benchmark and performs significantly better than Empirical Risk Minimization.
arXiv Detail & Related papers (2021-09-07T08:36:09Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.