Related papers: DADAgger: Disagreement-Augmented Dataset Aggregation

DADAgger: Disagreement-Augmented Dataset Aggregation

URL: http://arxiv.org/abs/2301.01348v1
Date: Tue, 3 Jan 2023 20:44:14 GMT
Title: DADAgger: Disagreement-Augmented Dataset Aggregation
Authors: Akash Haridas, Karim Hamadeh, Samarendra Chandan Bindu Dash
Abstract summary: DAgger is an imitation algorithm that aggregates its original datasets by querying the expert on all samples encountered during training. We propose a modification to DAgger, known as DADAgger, which only queries the expert for state-action pairs that are out of distribution.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: DAgger is an imitation algorithm that aggregates its original datasets by querying the expert on all samples encountered during training. In order to reduce the number of samples queried, we propose a modification to DAgger, known as DADAgger, which only queries the expert for state-action pairs that are out of distribution (OOD). OOD states are identified by measuring the variance of the action predictions of an ensemble of models on each state, which we simulate using dropout. Testing on the Car Racing and Half Cheetah environments achieves comparable performance to DAgger but with reduced expert queries, and better performance than a random sampling baseline. We also show that our algorithm may be used to build efficient, well-balanced training datasets by running with no initial data and only querying the expert to resolve uncertainty.

Related papers

Improving MoE Compute Efficiency by Composing Weight and Data Sparsity [50.654297246411545]
Mixture-of-Experts layers achieve compute efficiency through weight sparsity.<n>Data sparsity, where each expert processes only a subset of tokens, offers a complementary axis.
arXiv Detail & Related papers (2026-01-21T18:53:58Z)
Generalization is not a universal guarantee: Estimating similarity to training data with an ensemble out-of-distribution metric [0.09363323206192666]
Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems. We propose a standardized approach for assessing data similarity by constructing a supervised autoencoder for generalizability estimation (SAGE) We show that out-of-the-box model performance increases after SAGE score filtering, even when applied to data from the model's own training and test datasets.
arXiv Detail & Related papers (2025-02-22T19:21:50Z)
Optimal Algorithms for Augmented Testing of Discrete Distributions [25.818433126197036]
We show that a predictor can indeed reduce the number of samples required for all three property testing tasks. A key advantage of our algorithms is their adaptability to the precision of the prediction. We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
arXiv Detail & Related papers (2024-12-01T21:31:22Z)
Efficiency of Unsupervised Anomaly Detection Methods on Software Logs [0.0]
This paper studies unsupervised and time efficient methods for anomaly detection. The models are evaluated on four public datasets. For speed, the OOV detector with word representation is optimal. For accuracy, the OOV detector combined with trigram representation yields the highest AUC-ROC (0.846)
arXiv Detail & Related papers (2023-12-04T14:44:31Z)
A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z)
FairWASP: Fast and Optimal Fair Wasserstein Pre-processing [9.627848184502783]
We present FairWASP, a novel pre-processing approach to reduce disparities in classification datasets without modifying the original data. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method.
arXiv Detail & Related papers (2023-10-31T19:36:00Z)
A Generic Machine Learning Framework for Fully-Unsupervised Anomaly Detection with Contaminated Data [0.0]
We introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks. The framework is generic and can be applied to any residual-based machine learning model. We show its clear superiority over the naive approach of training with contaminated data without refinement.
arXiv Detail & Related papers (2023-08-25T12:47:59Z)
Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation [2.7759072740347017]
We introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals. In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data. Our method presents state-of-the-art results in comparison to existing techniques.
arXiv Detail & Related papers (2023-05-25T10:50:30Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets. Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.