A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets
- URL: http://arxiv.org/abs/2110.04698v2
- Date: Sat, 9 Dec 2023 10:05:10 GMT
- Title: A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets
- Authors: Jake Grigsby, Yanjun Qi
- Abstract summary: Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience.
Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise.
This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.
- Score: 15.206465106699293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Offline Reinforcement Learning methods have succeeded in learning
high-performance policies from fixed datasets of experience. A particularly
effective approach learns to first identify and then mimic optimal
decision-making strategies. Our work evaluates this method's ability to scale
to vast datasets consisting almost entirely of sub-optimal noise. A thorough
investigation on a custom benchmark helps identify several key challenges
involved in learning from high-noise datasets. We re-purpose prioritized
experience sampling to locate expert-level demonstrations among millions of
low-performance samples. This modification enables offline agents to learn
state-of-the-art policies in benchmark tasks using datasets where expert
actions are outnumbered nearly 65:1.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Robust Offline Imitation Learning from Diverse Auxiliary Data [33.14745744587572]
offline imitation learning enables learning a policy solely from a set of expert demonstrations.
Recent works incorporate large numbers of auxiliary demonstrations alongside the expert data.
We propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA)
arXiv Detail & Related papers (2024-10-04T17:30:54Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - DIDA: Denoised Imitation Learning based on Domain Adaptation [28.36684781402964]
We focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise.
We propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data.
Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
arXiv Detail & Related papers (2024-04-04T11:29:05Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Benchmarking of Query Strategies: Towards Future Deep Active Learning [0.0]
We benchmark query strategies for deep actice learning(DAL)
DAL reduces annotation costs by annotating only high-quality samples selected by query strategies.
arXiv Detail & Related papers (2023-12-10T04:17:16Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.