Data Lifecycle Management in Evolving Input Distributions for
Learning-based Aerospace Applications
- URL: http://arxiv.org/abs/2209.06855v1
- Date: Wed, 14 Sep 2022 18:15:56 GMT
- Title: Data Lifecycle Management in Evolving Input Distributions for
Learning-based Aerospace Applications
- Authors: Somrita Banerjee, Apoorva Sharma, Edward Schmerling, Max Spolaor,
Michael Nemerouf, Marco Pavone
- Abstract summary: This paper presents a framework to incrementally retrain a model by selecting a subset of test inputs to label.
Algorithms within this framework are evaluated based on (1) model performance throughout mission lifetime and (2) cumulative costs associated with labeling and model retraining.
- Score: 23.84037777018747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As input distributions evolve over a mission lifetime, maintaining
performance of learning-based models becomes challenging. This paper presents a
framework to incrementally retrain a model by selecting a subset of test inputs
to label, which allows the model to adapt to changing input distributions.
Algorithms within this framework are evaluated based on (1) model performance
throughout mission lifetime and (2) cumulative costs associated with labeling
and model retraining. We provide an open-source benchmark of a satellite pose
estimation model trained on images of a satellite in space and deployed in
novel scenarios (e.g., different backgrounds or misbehaving pixels), where
algorithms are evaluated on their ability to maintain high performance by
retraining on a subset of inputs. We also propose a novel algorithm to select a
diverse subset of inputs for labeling, by characterizing the information gain
from an input using Bayesian uncertainty quantification and choosing a subset
that maximizes collective information gain using concepts from batch active
learning. We show that our algorithm outperforms others on the benchmark, e.g.,
achieves comparable performance to an algorithm that labels 100% of inputs,
while only labeling 50% of inputs, resulting in low costs and high performance
over the mission lifetime.
Related papers
- In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models [37.45103473809928]
We propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model.
By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data.
arXiv Detail & Related papers (2024-08-07T05:48:05Z) - Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Bandit-Driven Batch Selection for Robust Learning under Label Noise [20.202806541218944]
We introduce a novel approach for batch selection in Gradient Descent (SGD) training, leveraging bandit algorithms.
Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets.
arXiv Detail & Related papers (2023-10-31T19:19:01Z) - AdaSelection: Accelerating Deep Learning Training through Data
Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - Preserving Fairness in AI under Domain Shift [15.820660013260584]
Existing algorithms for ensuring fairness in AI use a single-shot training strategy.
We develop an algorithm to adapt a fair model to remain fair under domain shift.
arXiv Detail & Related papers (2023-01-29T06:13:40Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Active and Incremental Learning with Weak Supervision [7.2288756536476635]
In this work, we describe combinations of an incremental learning scheme and methods of active learning.
An object detection task is evaluated in a continuous exploration context on the PASCAL VOC dataset.
We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application.
arXiv Detail & Related papers (2020-01-20T13:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.