Related papers: ALP: Action-Aware Embodied Learning for Perception

ALP: Action-Aware Embodied Learning for Perception

URL: http://arxiv.org/abs/2306.10190v2
Date: Tue, 17 Oct 2023 15:44:32 GMT
Title: ALP: Action-Aware Embodied Learning for Perception
Authors: Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter Abbeel
Abstract summary: We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
Score: 60.64801970249279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current methods in training and benchmarking vision models exhibit an over-reliance on passive, curated datasets. Although models trained on these datasets have shown strong performance in a wide variety of tasks such as classification, detection, and segmentation, they fundamentally are unable to generalize to an ever-evolving world due to constant out-of-distribution shifts of input data. Therefore, instead of training on fixed datasets, can we approach learning in a more human-centric and adaptive manner? In this paper, we introduce Action-Aware Embodied Learning for Perception (ALP), an embodied learning framework that incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. Our method actively explores in complex 3D environments to both learn generalizable task-agnostic visual representations as well as collect downstream training data. We show that ALP outperforms existing baselines in several downstream perception tasks. In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.

Related papers

Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning [49.10890099624699]
We introduce a dynamic dataset pruning framework that adaptively selects training samples based on task-driven difficulty and cross-modality semantic consistency.<n>Our work highlights the potential of integrating cross-modality alignment for robust sample selection, advancing data-centric learning toward more efficient and robust practices across application domains.
arXiv Detail & Related papers (2025-07-17T03:08:26Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Automatic Identification and Visualization of Group Training Activities Using Wearable Data [7.130450173185638]
Human Activity Recognition (HAR) identifies daily activities from time-series data collected by wearable devices like smartwatches. This paper presents a comprehensive framework for imputing, analyzing, and identifying activities from wearable data. Our approach is based on data collected from 135 soldiers wearing Garmin 55 smartwatches over six months.
arXiv Detail & Related papers (2024-10-07T19:35:15Z)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback. Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data. We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
Functional Knowledge Transfer with Self-supervised Representation Learning [11.566644244783305]
This work investigates the unexplored usability of self-supervised representation learning in the direction of functional knowledge transfer. In this work, functional knowledge transfer is achieved by joint optimization of self-supervised learning pseudo task and supervised learning task.
arXiv Detail & Related papers (2023-03-12T21:14:59Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data. We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z)
Robust Representation Learning via Perceptual Similarity Metrics [18.842322467828502]
Contrastive Input Morphing (CIM) is a representation learning framework that learns input-space transformations of the data. We show that CIM is complementary to other mutual information-based representation learning techniques.
arXiv Detail & Related papers (2021-06-11T21:45:44Z)
Learning Actor-centered Representations for Action Localization in Streaming Videos using Predictive Learning [18.757368441841123]
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for tackling visual understanding tasks. We tackle the problem of learning textitactor-centered representations through the notion of continual hierarchical predictive learning. Inspired by cognitive theories of event perception, we propose a novel, self-supervised framework.
arXiv Detail & Related papers (2021-04-29T06:06:58Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.