Data Retrieval with Importance Weights for Few-Shot Imitation Learning
- URL: http://arxiv.org/abs/2509.01657v1
- Date: Mon, 01 Sep 2025 17:58:41 GMT
- Title: Data Retrieval with Importance Weights for Few-Shot Imitation Learning
- Authors: Amber Xie, Rahul Chand, Dorsa Sadigh, Joey Hejna,
- Abstract summary: We introduce Importance Weighted Retrieval (IWR), which estimates importance weights, or the ratio between the target and prior data distributions for retrieval.<n>IWR consistently improves performance of existing retrieval-based methods, despite only requiring minor modifications.
- Score: 31.8638426686593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large-scale robot datasets have propelled recent progress in imitation learning, learning from smaller task specific datasets remains critical for deployment in new environments and unseen tasks. One such approach to few-shot imitation learning is retrieval-based imitation learning, which extracts relevant samples from large, widely available prior datasets to augment a limited demonstration dataset. To determine the relevant data from prior datasets, retrieval-based approaches most commonly calculate a prior data point's minimum distance to a point in the target dataset in latent space. While retrieval-based methods have shown success using this metric for data selection, we demonstrate its equivalence to the limit of a Gaussian kernel density (KDE) estimate of the target data distribution. This reveals two shortcomings of the retrieval rule used in prior work. First, it relies on high-variance nearest neighbor estimates that are susceptible to noise. Second, it does not account for the distribution of prior data when retrieving data. To address these issues, we introduce Importance Weighted Retrieval (IWR), which estimates importance weights, or the ratio between the target and prior data distributions for retrieval, using Gaussian KDEs. By considering the probability ratio, IWR seeks to mitigate the bias of previous selection rules, and by using reasonable modeling parameters, IWR effectively smooths estimates using all data points. Across both simulation environments and real-world evaluations on the Bridge dataset we find that our method, IWR, consistently improves performance of existing retrieval-based methods, despite only requiring minor modifications.
Related papers
- Formal Bayesian Transfer Learning via the Total Risk Prior [1.8570591025615457]
We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system.<n>We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model.
arXiv Detail & Related papers (2025-07-31T17:55:16Z) - PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity [6.6157730528755065]
We study the Incremental Data Selection (IDS) problem, where examples arrive as a continuous stream, and need to be selected without access to the full data source.<n>We propose PEAKS, an efficient data selection method tailored for IDS.<n>Our comprehensive evaluations demonstrate that PEAKS consistently outperforms existing selection strategies.
arXiv Detail & Related papers (2025-04-07T16:42:09Z) - DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.<n>Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.<n>Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z) - What is the Right Notion of Distance between Predict-then-Optimize Tasks? [35.842182348661076]
OTD$3$ is a novel dataset distance that incorporates downstream decisions in addition to features and labels.<n>We show that our proposed distance accurately predicts model transferability across three different Predict-then- optimization tasks.
arXiv Detail & Related papers (2024-09-11T04:13:17Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance.
We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset.
We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Missing Data Imputation using Optimal Transport [43.14084843713895]
We leverage optimal transport distances to quantify a criterion and turn it into a loss function to impute missing data values.
We propose practical methods to minimize these losses using end-to-end learning.
These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.
arXiv Detail & Related papers (2020-02-10T15:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.