Active WeaSuL: Improving Weak Supervision with Active Learning
- URL: http://arxiv.org/abs/2104.14847v1
- Date: Fri, 30 Apr 2021 08:58:26 GMT
- Title: Active WeaSuL: Improving Weak Supervision with Active Learning
- Authors: Samantha Biegel, Rafah El-Khatib, Luiz Otavio Vilas Boas Oliveira, Max
Baak, Nanne Aben
- Abstract summary: We propose Active WeaSuL: an approach that incorporates active learning into weak supervision.
We make two contributions: 1) a modification of the weak supervision loss function, such that the expert-labelled data inform and improve the combination of weak labels; and 2) the maxKL divergence sampling strategy, which determines for which data points expert labelling is most beneficial.
- Score: 2.624902795082451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The availability of labelled data is one of the main limitations in machine
learning. We can alleviate this using weak supervision: a framework that uses
expert-defined rules $\boldsymbol{\lambda}$ to estimate probabilistic labels
$p(y|\boldsymbol{\lambda})$ for the entire data set. These rules, however, are
dependent on what experts know about the problem, and hence may be inaccurate
or may fail to capture important parts of the problem-space. To mitigate this,
we propose Active WeaSuL: an approach that incorporates active learning into
weak supervision. In Active WeaSuL, experts do not only define rules, but they
also iteratively provide the true label for a small set of points where the
weak supervision model is most likely to be mistaken, which are then used to
better estimate the probabilistic labels. In this way, the weak labels provide
a warm start, which active learning then improves upon. We make two
contributions: 1) a modification of the weak supervision loss function, such
that the expert-labelled data inform and improve the combination of weak
labels; and 2) the maxKL divergence sampling strategy, which determines for
which data points expert labelling is most beneficial. Our experiments show
that when the budget for labelling data is limited (e.g. $\leq 60$ data
points), Active WeaSuL outperforms weak supervision, active learning, and
competing strategies, with only a handful of labelled data points. This makes
Active WeaSuL ideal for situations where obtaining labelled data is difficult.
Related papers
- All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Losses over Labels: Weakly Supervised Learning via Direct Loss
Construction [71.11337906077483]
Programmable weak supervision is a growing paradigm within machine learning.
We propose Losses over Labels (LoL) as it creates losses directly from ofs without going through the intermediate step of a label.
We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks.
arXiv Detail & Related papers (2022-12-13T22:29:14Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z) - Training Subset Selection for Weak Supervision [17.03788288165262]
We show a tradeoff between the amount of weakly-labeled data and the precision of the weak labels.
We combine pretrained data representations with the cut statistic to select high-quality subsets of the weakly-labeled training data.
Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
arXiv Detail & Related papers (2022-06-06T21:31:32Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - How to Leverage Unlabeled Data in Offline Reinforcement Learning [125.72601809192365]
offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.
One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.
We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
arXiv Detail & Related papers (2022-02-03T18:04:54Z) - Self-Supervised Learning from Semantically Imprecise Data [7.24935792316121]
Learning from imprecise labels such as "animal" or "bird" is an important capability when expertly labeled training data is scarce.
CHILLAX is a recently proposed method to tackle this task.
We extend CHILLAX with a self-supervised scheme using constrained extrapolation to generate pseudo-labels.
arXiv Detail & Related papers (2021-04-22T07:26:14Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z) - Strength from Weakness: Fast Learning Using Weak Supervision [81.41106207042948]
Having access to weak labels can significantly accelerate the learning rate for the strong task to the fast rate of $mathcalO(nicefrac1n)$.
Actual acceleration depends continuously on the number of weak labels available, and on the relation between the two tasks.
arXiv Detail & Related papers (2020-02-19T22:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.