biquality-learn: a Python library for Biquality Learning
- URL: http://arxiv.org/abs/2308.09643v1
- Date: Fri, 18 Aug 2023 16:01:18 GMT
- Title: biquality-learn: a Python library for Biquality Learning
- Authors: Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine
Cornu\'ejols
- Abstract summary: Biquality Learning is proposed as a framework to design algorithms capable of handling weaknesses of supervision and dataset shifts.
Python library for Biquality Learning with an intuitive and consistent API to learn machine learning models from biquality data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The democratization of Data Mining has been widely successful thanks in part
to powerful and easy-to-use Machine Learning libraries. These libraries have
been particularly tailored to tackle Supervised Learning. However, strong
supervision signals are scarce in practice, and practitioners must resort to
weak supervision. In addition to weaknesses of supervision, dataset shifts are
another kind of phenomenon that occurs when deploying machine learning models
in the real world. That is why Biquality Learning has been proposed as a
machine learning framework to design algorithms capable of handling multiple
weaknesses of supervision and dataset shifts without assumptions on their
nature and level by relying on the availability of a small trusted dataset
composed of cleanly labeled and representative samples. Thus we propose
biquality-learn: a Python library for Biquality Learning with an intuitive and
consistent API to learn machine learning models from biquality data, with
well-proven algorithms, accessible and easy to use for everyone, and enabling
researchers to experiment in a reproducible way on biquality data.
Related papers
- Biquality Learning: a Framework to Design Algorithms Dealing with
Closed-Set Distribution Shifts [0.0]
We think the biquality data setup is a suitable framework for designing such algorithms.
The trusted and untrusted datasets available at training time make designing algorithms dealing with any distribution shifts possible.
We experiment with two novel methods to synthetically introduce concept drift and class-conditional shifts in real-world datasets.
arXiv Detail & Related papers (2023-08-29T08:57:47Z) - Improving Behavioural Cloning with Positive Unlabeled Learning [15.484227081812852]
We propose a novel iterative learning algorithm for identifying expert trajectories in mixed-quality robotics datasets.
Applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines.
arXiv Detail & Related papers (2023-01-27T14:17:45Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Federated Self-Supervised Learning in Heterogeneous Settings: Limits of
a Baseline Approach on HAR [0.5039813366558306]
We show that standard lightweight autoencoder and standard Federated Averaging fail to learn a robust representation for Human Activity Recognition.
These findings advocate for a more intensive research effort in Federated Self Supervised Learning.
arXiv Detail & Related papers (2022-07-17T14:15:45Z) - Development of a robust cascaded architecture for intelligent robot
grasping using limited labelled data [0.0]
In the case of robots, we can not afford to spend that much time on making it to learn how to grasp objects effectively.
We propose an efficient learning architecture based on VQVAE so that robots can be taught with sufficient data corresponding to correct grasping.
A semi-supervised learning based model which has much more generalization capability even with limited labelled data set has been investigated.
arXiv Detail & Related papers (2021-11-06T11:01:15Z) - Investigating a Baseline Of Self Supervised Learning Towards Reducing
Labeling Costs For Image Classification [0.0]
The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task.
Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task.
arXiv Detail & Related papers (2021-08-17T06:43:05Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.