Related papers: Active WeaSuL: Improving Weak Supervision with Active Learning

Active WeaSuL: Improving Weak Supervision with Active Learning

URL: http://arxiv.org/abs/2104.14847v1
Date: Fri, 30 Apr 2021 08:58:26 GMT
Title: Active WeaSuL: Improving Weak Supervision with Active Learning
Authors: Samantha Biegel, Rafah El-Khatib, Luiz Otavio Vilas Boas Oliveira, Max Baak, Nanne Aben
Abstract summary: We propose Active WeaSuL: an approach that incorporates active learning into weak supervision. We make two contributions: 1) a modification of the weak supervision loss function, such that the expert-labelled data inform and improve the combination of weak labels; and 2) the maxKL divergence sampling strategy, which determines for which data points expert labelling is most beneficial.
Score: 2.624902795082451
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The availability of labelled data is one of the main limitations in machine learning. We can alleviate this using weak supervision: a framework that uses expert-defined rules $\boldsymbol{\lambda}$ to estimate probabilistic labels $p(y|\boldsymbol{\lambda})$ for the entire data set. These rules, however, are dependent on what experts know about the problem, and hence may be inaccurate or may fail to capture important parts of the problem-space. To mitigate this, we propose Active WeaSuL: an approach that incorporates active learning into weak supervision. In Active WeaSuL, experts do not only define rules, but they also iteratively provide the true label for a small set of points where the weak supervision model is most likely to be mistaken, which are then used to better estimate the probabilistic labels. In this way, the weak labels provide a warm start, which active learning then improves upon. We make two contributions: 1) a modification of the weak supervision loss function, such that the expert-labelled data inform and improve the combination of weak labels; and 2) the maxKL divergence sampling strategy, which determines for which data points expert labelling is most beneficial. Our experiments show that when the budget for labelling data is limited (e.g. $\leq 60$ data points), Active WeaSuL outperforms weak supervision, active learning, and competing strategies, with only a handful of labelled data points. This makes Active WeaSuL ideal for situations where obtaining labelled data is difficult.

Related papers

All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Losses over Labels: Weakly Supervised Learning via Direct Loss Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning. We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z)
Training Subset Selection for Weak Supervision [17.03788288165262]
We show a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We combine pretrained data representations with the cut statistic to select high-quality subsets of the weakly-labeled training data. Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
arXiv Detail & Related papers (2022-06-06T21:31:32Z)
PointMatch: A Consistency Training Framework for Weakly Supervised Semantic Segmentation of 3D Point Clouds [117.77841399002666]
We propose a novel framework, PointMatch, that stands on both data and label, by applying consistency regularization to sufficiently probe information from data itself. The proposed PointMatch achieves the state-of-the-art performance under various weakly-supervised schemes on both ScanNet-v2 and S3DIS datasets.
arXiv Detail & Related papers (2022-02-22T07:26:31Z)
Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals. We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z)
How to Leverage Unlabeled Data in Offline Reinforcement Learning [125.72601809192365]
offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
arXiv Detail & Related papers (2022-02-03T18:04:54Z)
Self-Supervised Learning from Semantically Imprecise Data [7.24935792316121]
Learning from imprecise labels such as "animal" or "bird" is an important capability when expertly labeled training data is scarce. CHILLAX is a recently proposed method to tackle this task. We extend CHILLAX with a self-supervised scheme using constrained extrapolation to generate pseudo-labels.
arXiv Detail & Related papers (2021-04-22T07:26:14Z)
Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling. We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection. We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z)
Strength from Weakness: Fast Learning Using Weak Supervision [81.41106207042948]
Having access to weak labels can significantly accelerate the learning rate for the strong task to the fast rate of $mathcalO(nicefrac1n)$. Actual acceleration depends continuously on the number of weak labels available, and on the relation between the two tasks.
arXiv Detail & Related papers (2020-02-19T22:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.