Robust Offline Imitation Learning from Diverse Auxiliary Data
- URL: http://arxiv.org/abs/2410.03626v1
- Date: Fri, 4 Oct 2024 17:30:54 GMT
- Title: Robust Offline Imitation Learning from Diverse Auxiliary Data
- Authors: Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit K. Roy-Chowdhury,
- Abstract summary: offline imitation learning enables learning a policy solely from a set of expert demonstrations.
Recent works incorporate large numbers of auxiliary demonstrations alongside the expert data.
We propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA)
- Score: 33.14745744587572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.
Related papers
- Out-Of-Distribution Detection with Diversification (Provably) [75.44158116183483]
Out-of-distribution (OOD) detection is crucial for ensuring reliable deployment of machine learning models.
Recent advancements focus on utilizing easily accessible auxiliary outliers (e.g., data from the web or other datasets) in training.
We propose a theoretical guarantee, termed Diversity-induced Mixup for OOD detection (diverseMix), which enhances the diversity of auxiliary outlier set for training.
arXiv Detail & Related papers (2024-11-21T11:56:32Z) - Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named
Entity Recognition [10.03246698225533]
Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER
Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation.
Experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines.
arXiv Detail & Related papers (2023-07-11T14:44:14Z) - Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled
Learning [42.26185670834855]
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples.
This paper focuses on improving the commonly-used nnPU with a novel training pipeline.
arXiv Detail & Related papers (2022-11-30T05:48:31Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets [15.206465106699293]
Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience.
Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise.
This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.
arXiv Detail & Related papers (2021-10-10T03:55:17Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.