Related papers: Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

URL: http://arxiv.org/abs/2307.01578v1
Date: Tue, 4 Jul 2023 09:11:33 GMT
Title: Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation
Authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo
Abstract summary: This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods.
Score: 11.4375764457726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.

Related papers

Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization [0.0]
We present an integrated learning and optimization procedure that yields the best approximation of an unknown situation. Numerical results conducted with inventory problems from the literature as well as a bike-sharing problem with real data demonstrate that the proposed approach performs well.
arXiv Detail & Related papers (2024-11-05T21:54:50Z)
Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning [49.417414031031264]
This paper studies learning fair encoders in a self-supervised learning setting. All data are unlabeled and only a small portion of them are annotated with sensitive attributes.
arXiv Detail & Related papers (2024-06-09T08:11:12Z)
The Battleship Approach to the Low Resource Entity Matching Problem [0.0]
We propose a new active learning approach for entity matching problems. We focus on a selection mechanism that exploits unique properties of entity matching. An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
arXiv Detail & Related papers (2023-11-27T10:18:17Z)
Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates [0.6083861980670925]
This paper proposes a novel surrogate-based global optimization algorithm to solve linearly constrained mixed-variable problems. We assume the objective function is black-box and expensive-to-evaluate, while the linear constraints are quantifiable unrelaxable a priori known. We introduce two types of exploration functions to efficiently search the feasible domain via mixed-integer linear programming solvers.
arXiv Detail & Related papers (2023-02-09T15:04:35Z)
Communication-Efficient Robust Federated Learning with Noisy Labels [144.31995882209932]
Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed located data. We propose a learning-based reweighting approach to mitigate the effect of noisy labels in FL. Our approach has shown superior performance on several real-world datasets compared to various baselines.
arXiv Detail & Related papers (2022-06-11T16:21:17Z)
Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation. A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization. Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z)
Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.