Optimal and Efficient Binary Questioning for Human-in-the-Loop
Annotation
- URL: http://arxiv.org/abs/2307.01578v1
- Date: Tue, 4 Jul 2023 09:11:33 GMT
- Title: Optimal and Efficient Binary Questioning for Human-in-the-Loop
Annotation
- Authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi,
Gabriele Facciolo
- Abstract summary: This paper studies the neglected complementary problem of getting annotated data given a predictor.
For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods.
- Score: 11.4375764457726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even though data annotation is extremely important for interpretability,
research and development of artificial intelligence solutions, most research
efforts such as active learning or few-shot learning focus on the sample
efficiency problem. This paper studies the neglected complementary problem of
getting annotated data given a predictor. For the simple binary classification
setting, we present the spectrum ranging from optimal general solutions to
practical efficient methods. The problem is framed as the full annotation of a
binary classification dataset with the minimal number of yes/no questions when
a predictor is available. For the case of general binary questions the solution
is found in coding theory, where the optimal questioning strategy is given by
the Huffman encoding of the possible labelings. However, this approach is
computationally intractable even for small dataset sizes. We propose an
alternative practical solution based on several heuristics and lookahead
minimization of proxy cost functions. The proposed solution is analysed,
compared with optimal solutions and evaluated on several synthetic and
real-world datasets. On these datasets, the method allows a significant
improvement ($23-86\%$) in annotation efficiency.
Related papers
- Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization [0.0]
We present an integrated learning and optimization procedure that yields the best approximation of an unknown situation.
Numerical results conducted with inventory problems from the literature as well as a bike-sharing problem with real data demonstrate that the proposed approach performs well.
arXiv Detail & Related papers (2024-11-05T21:54:50Z) - Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning [49.417414031031264]
This paper studies learning fair encoders in a self-supervised learning setting.
All data are unlabeled and only a small portion of them are annotated with sensitive attributes.
arXiv Detail & Related papers (2024-06-09T08:11:12Z) - The Battleship Approach to the Low Resource Entity Matching Problem [0.0]
We propose a new active learning approach for entity matching problems.
We focus on a selection mechanism that exploits unique properties of entity matching.
An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
arXiv Detail & Related papers (2023-11-27T10:18:17Z) - Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates [0.6083861980670925]
This paper proposes a novel surrogate-based global optimization algorithm to solve linearly constrained mixed-variable problems.
We assume the objective function is black-box and expensive-to-evaluate, while the linear constraints are quantifiable unrelaxable a priori known.
We introduce two types of exploration functions to efficiently search the feasible domain via mixed-integer linear programming solvers.
arXiv Detail & Related papers (2023-02-09T15:04:35Z) - Communication-Efficient Robust Federated Learning with Noisy Labels [144.31995882209932]
Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed located data.
We propose a learning-based reweighting approach to mitigate the effect of noisy labels in FL.
Our approach has shown superior performance on several real-world datasets compared to various baselines.
arXiv Detail & Related papers (2022-06-11T16:21:17Z) - Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation.
A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization.
Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.