A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets
- URL: http://arxiv.org/abs/2110.04698v2
- Date: Sat, 9 Dec 2023 10:05:10 GMT
- Title: A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets
- Authors: Jake Grigsby, Yanjun Qi
- Abstract summary: Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience.
Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise.
This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.
- Score: 15.206465106699293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Offline Reinforcement Learning methods have succeeded in learning
high-performance policies from fixed datasets of experience. A particularly
effective approach learns to first identify and then mimic optimal
decision-making strategies. Our work evaluates this method's ability to scale
to vast datasets consisting almost entirely of sub-optimal noise. A thorough
investigation on a custom benchmark helps identify several key challenges
involved in learning from high-noise datasets. We re-purpose prioritized
experience sampling to locate expert-level demonstrations among millions of
low-performance samples. This modification enables offline agents to learn
state-of-the-art policies in benchmark tasks using datasets where expert
actions are outnumbered nearly 65:1.
Related papers
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - DIDA: Denoised Imitation Learning based on Domain Adaptation [28.36684781402964]
We focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise.
We propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data.
Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
arXiv Detail & Related papers (2024-04-04T11:29:05Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Benchmarking of Query Strategies: Towards Future Deep Active Learning [0.0]
We benchmark query strategies for deep actice learning(DAL)
DAL reduces annotation costs by annotating only high-quality samples selected by query strategies.
arXiv Detail & Related papers (2023-12-10T04:17:16Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.