An Adaptive Method for Weak Supervision with Drifting Data
- URL: http://arxiv.org/abs/2306.01658v1
- Date: Fri, 2 Jun 2023 16:27:34 GMT
- Title: An Adaptive Method for Weak Supervision with Drifting Data
- Authors: Alessio Mazzetto, Reza Esfandiarpoor, Eli Upfal, Stephen H. Bach
- Abstract summary: We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting.
We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time.
Our algorithm does not require any assumptions on the magnitude of the drift, and it adapts based on the input.
- Score: 11.035811912078216
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce an adaptive method with formal quality guarantees for weak
supervision in a non-stationary setting. Our goal is to infer the unknown
labels of a sequence of data by using weak supervision sources that provide
independent noisy signals of the correct classification for each data point.
This setting includes crowdsourcing and programmatic weak supervision. We focus
on the non-stationary case, where the accuracy of the weak supervision sources
can drift over time, e.g., because of changes in the underlying data
distribution. Due to the drift, older data could provide misleading information
to infer the label of the current data point. Previous work relied on a priori
assumptions on the magnitude of the drift to decide how much data to use from
the past. Comparatively, our algorithm does not require any assumptions on the
drift, and it adapts based on the input. In particular, at each step, our
algorithm guarantees an estimation of the current accuracies of the weak
supervision sources over a window of past observations that minimizes a
trade-off between the error due to the variance of the estimation and the error
due to the drift. Experiments on synthetic and real-world labelers show that
our approach indeed adapts to the drift. Unlike fixed-window-size strategies,
it dynamically chooses a window size that allows it to consistently maintain
good performance.
Related papers
- Distribution-Free Predictive Inference under Unknown Temporal Drift [1.024113475677323]
We propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets.
We provide sharp coverage guarantees for our method, showing its adaptivity to the underlying temporal drift.
arXiv Detail & Related papers (2024-06-10T17:55:43Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Unsupervised Accuracy Estimation of Deep Visual Models using
Domain-Adaptive Adversarial Perturbation without Source Samples [1.1852406625172216]
We propose a new framework to estimate model accuracy on unlabeled target data without access to source data.
Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function.
Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training.
arXiv Detail & Related papers (2023-07-19T15:33:11Z) - Uncertainty-guided Source-free Domain Adaptation [77.3844160723014]
Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model.
We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation.
arXiv Detail & Related papers (2022-08-16T08:03:30Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Classification and Uncertainty Quantification of Corrupted Data using
Semi-Supervised Autoencoders [11.300365160909879]
We present a probabilistic approach to classify strongly corrupted data and quantify uncertainty.
A semi-supervised autoencoder trained on uncorrupted data is the underlying architecture.
We show that the model uncertainty strongly depends on whether the classification is correct or wrong.
arXiv Detail & Related papers (2021-05-27T18:47:55Z) - Open-Set Hypothesis Transfer with Semantic Consistency [99.83813484934177]
We introduce a method that focuses on the semantic consistency under transformation of target data.
Our model first discovers confident predictions and performs classification with pseudo-labels.
As a result, unlabeled data can be classified into discriminative classes coincided with either source classes or unknown classes.
arXiv Detail & Related papers (2020-10-01T10:44:31Z) - Unsupervised Domain Adaptation in the Absence of Source Data [0.7366405857677227]
We propose an unsupervised method for adapting a source classifier to a target domain that varies from the source domain along natural axes.
We validate our method in scenarios where the distribution shift involves brightness, contrast, and rotation and show that it outperforms fine-tuning baselines in scenarios with limited labeled data.
arXiv Detail & Related papers (2020-07-20T16:22:14Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.