On Assessing the Usefulness of Proxy Domains for Developing and
Evaluating Embodied Agents
- URL: http://arxiv.org/abs/2109.14516v1
- Date: Wed, 29 Sep 2021 16:04:39 GMT
- Title: On Assessing the Usefulness of Proxy Domains for Developing and
Evaluating Embodied Agents
- Authors: Anthony Courchesne (1 and 2), Andrea Censi (3) and Liam Paull (1 and
2) ((1) Mila, (2) Universit\'e de Montr\'eal, (3) ETH Z\"urich)
- Abstract summary: We argue that the value of a proxy is conditioned on the task that it is being used to help solve.
We establish new proxy usefulness (PU) metrics to compare the usefulness of different proxy domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In many situations it is either impossible or impractical to develop and
evaluate agents entirely on the target domain on which they will be deployed.
This is particularly true in robotics, where doing experiments on hardware is
much more arduous than in simulation. This has become arguably more so in the
case of learning-based agents. To this end, considerable recent effort has been
devoted to developing increasingly realistic and higher fidelity simulators.
However, we lack any principled way to evaluate how good a ``proxy domain'' is,
specifically in terms of how useful it is in helping us achieve our end
objective of building an agent that performs well in the target domain. In this
work, we investigate methods to address this need. We begin by clearly
separating two uses of proxy domains that are often conflated: 1) their ability
to be a faithful predictor of agent performance and 2) their ability to be a
useful tool for learning. In this paper, we attempt to clarify the role of
proxy domains and establish new proxy usefulness (PU) metrics to compare the
usefulness of different proxy domains. We propose the relative predictive PU to
assess the predictive ability of a proxy domain and the learning PU to quantify
the usefulness of a proxy as a tool to generate learning data. Furthermore, we
argue that the value of a proxy is conditioned on the task that it is being
used to help solve. We demonstrate how these new metrics can be used to
optimize parameters of the proxy domain for which obtaining ground truth via
system identification is not trivial.
Related papers
- Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents [64.75036903373712]
Proposer-Agent-Evaluator is a learning system that enables foundation model agents to autonomously discover and practice skills in the wild.
At the heart of PAE is a context-aware task proposer that autonomously proposes tasks for the agent to practice with context information.
The success evaluation serves as the reward signal for the agent to refine its policies through RL.
arXiv Detail & Related papers (2024-12-17T18:59:50Z) - Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL)
In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain.
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z) - Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation [65.13362950067744]
Crowd localization targets on predicting each instance precise location within an image.
Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction.
We propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift.
arXiv Detail & Related papers (2024-04-22T08:58:57Z) - Cross Domain Policy Transfer with Effect Cycle-Consistency [3.3213136251955815]
Training a robotic policy from scratch using deep reinforcement learning methods can be prohibitively expensive due to sample inefficiency.
We propose a novel approach for learning the mapping functions between state and action spaces across domains using unpaired data.
Our approach has been tested on three locomotion tasks and two robotic manipulation tasks.
arXiv Detail & Related papers (2024-03-04T13:20:07Z) - Towards Improved Proxy-based Deep Metric Learning via Data-Augmented
Domain Adaptation [15.254782791542329]
We present a novel proxy-based Deep Metric Learning framework.
We propose the Data-Augmented Domain Adaptation (DADA) method to adapt the domain gap between the group of samples and proxies.
Our experiments on benchmarks, including the popular CUB-200-2011, show that our learning algorithm significantly improves the existing proxy losses.
arXiv Detail & Related papers (2024-01-01T00:10:58Z) - Non-isotropy Regularization for Proxy-based Deep Metric Learning [78.18860829585182]
We propose non-isotropy regularization ($mathbbNIR$) for proxy-based Deep Metric Learning.
This allows us to explicitly induce a non-isotropic distribution of samples around a proxy to optimize for.
Experiments highlight consistent generalization benefits of $mathbbNIR$ while achieving competitive and state-of-the-art performance.
arXiv Detail & Related papers (2022-03-16T11:13:20Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Consequences of Misaligned AI [12.879600368339393]
This paper argues that we should view the design of reward functions as an interactive and dynamic process.
We show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions.
arXiv Detail & Related papers (2021-02-07T19:34:04Z) - What can I do here? A Theory of Affordances in Reinforcement Learning [65.70524105802156]
We develop a theory of affordances for agents who learn and plan in Markov Decision Processes.
Affordances play a dual role in this case, by reducing the number of actions available in any given situation.
We propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.
arXiv Detail & Related papers (2020-06-26T16:34:53Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.