ALP: Action-Aware Embodied Learning for Perception
- URL: http://arxiv.org/abs/2306.10190v2
- Date: Tue, 17 Oct 2023 15:44:32 GMT
- Title: ALP: Action-Aware Embodied Learning for Perception
- Authors: Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter
Abbeel
- Abstract summary: We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
- Score: 60.64801970249279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods in training and benchmarking vision models exhibit an
over-reliance on passive, curated datasets. Although models trained on these
datasets have shown strong performance in a wide variety of tasks such as
classification, detection, and segmentation, they fundamentally are unable to
generalize to an ever-evolving world due to constant out-of-distribution shifts
of input data. Therefore, instead of training on fixed datasets, can we
approach learning in a more human-centric and adaptive manner? In this paper,
we introduce Action-Aware Embodied Learning for Perception (ALP), an embodied
learning framework that incorporates action information into representation
learning through a combination of optimizing a reinforcement learning policy
and an inverse dynamics prediction objective. Our method actively explores in
complex 3D environments to both learn generalizable task-agnostic visual
representations as well as collect downstream training data. We show that ALP
outperforms existing baselines in several downstream perception tasks. In
addition, we show that by training on actively collected data more relevant to
the environment and task, our method generalizes more robustly to downstream
tasks compared to models pre-trained on fixed datasets such as ImageNet.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Automatic Identification and Visualization of Group Training Activities Using Wearable Data [7.130450173185638]
Human Activity Recognition (HAR) identifies daily activities from time-series data collected by wearable devices like smartwatches.
This paper presents a comprehensive framework for imputing, analyzing, and identifying activities from wearable data.
Our approach is based on data collected from 135 soldiers wearing Garmin 55 smartwatches over six months.
arXiv Detail & Related papers (2024-10-07T19:35:15Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Robust Representation Learning via Perceptual Similarity Metrics [18.842322467828502]
Contrastive Input Morphing (CIM) is a representation learning framework that learns input-space transformations of the data.
We show that CIM is complementary to other mutual information-based representation learning techniques.
arXiv Detail & Related papers (2021-06-11T21:45:44Z) - Learning Actor-centered Representations for Action Localization in
Streaming Videos using Predictive Learning [18.757368441841123]
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for tackling visual understanding tasks.
We tackle the problem of learning textitactor-centered representations through the notion of continual hierarchical predictive learning.
Inspired by cognitive theories of event perception, we propose a novel, self-supervised framework.
arXiv Detail & Related papers (2021-04-29T06:06:58Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.