Biquality Learning: a Framework to Design Algorithms Dealing with
Closed-Set Distribution Shifts
- URL: http://arxiv.org/abs/2308.15132v1
- Date: Tue, 29 Aug 2023 08:57:47 GMT
- Title: Biquality Learning: a Framework to Design Algorithms Dealing with
Closed-Set Distribution Shifts
- Authors: Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine
Cornu\'ejols
- Abstract summary: We think the biquality data setup is a suitable framework for designing such algorithms.
The trusted and untrusted datasets available at training time make designing algorithms dealing with any distribution shifts possible.
We experiment with two novel methods to synthetically introduce concept drift and class-conditional shifts in real-world datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training machine learning models from data with weak supervision and dataset
shifts is still challenging. Designing algorithms when these two situations
arise has not been explored much, and existing algorithms cannot always handle
the most complex distributional shifts. We think the biquality data setup is a
suitable framework for designing such algorithms. Biquality Learning assumes
that two datasets are available at training time: a trusted dataset sampled
from the distribution of interest and the untrusted dataset with dataset shifts
and weaknesses of supervision (aka distribution shifts). The trusted and
untrusted datasets available at training time make designing algorithms dealing
with any distribution shifts possible. We propose two methods, one inspired by
the label noise literature and another by the covariate shift literature for
biquality learning. We experiment with two novel methods to synthetically
introduce concept drift and class-conditional shifts in real-world datasets
across many of them. We opened some discussions and assessed that developing
biquality learning algorithms robust to distributional changes remains an
interesting problem for future research.
Related papers
- Continual Learning for Multimodal Data Fusion of a Soft Gripper [1.0589208420411014]
A model trained on one data modality often fails when tested with a different modality.
We introduce a continual learning algorithm capable of incrementally learning different data modalities.
We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset.
arXiv Detail & Related papers (2024-09-20T09:53:27Z) - Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - biquality-learn: a Python library for Biquality Learning [0.0]
Biquality Learning is proposed as a framework to design algorithms capable of handling weaknesses of supervision and dataset shifts.
Python library for Biquality Learning with an intuitive and consistent API to learn machine learning models from biquality data.
arXiv Detail & Related papers (2023-08-18T16:01:18Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - OoD-Bench: Benchmarking and Understanding Out-of-Distribution
Generalization Datasets and Algorithms [28.37021464780398]
We show that existing OoD algorithms that outperform empirical risk minimization on one distribution shift usually have limitations on the other distribution shift.
The new benchmark may serve as a strong foothold that can be resorted to by future OoD generalization research.
arXiv Detail & Related papers (2021-06-07T15:34:36Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.