Early Forecasting of Text Classification Accuracy and F-Measure with
Active Learning
- URL: http://arxiv.org/abs/2001.10337v2
- Date: Sat, 11 Apr 2020 08:59:27 GMT
- Title: Early Forecasting of Text Classification Accuracy and F-Measure with
Active Learning
- Authors: Thomas Orth and Michael Bloodgood
- Abstract summary: We investigate the difference in forecasting difficulty when using accuracy and F-measure as the text classification system performance metrics.
We find that forecasting is easiest for decision tree learning, moderate for Support Vector Machines, and most difficult for neural networks.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When creating text classification systems, one of the major bottlenecks is
the annotation of training data. Active learning has been proposed to address
this bottleneck using stopping methods to minimize the cost of data annotation.
An important capability for improving the utility of stopping methods is to
effectively forecast the performance of the text classification models.
Forecasting can be done through the use of logarithmic models regressed on some
portion of the data as learning is progressing. A critical unexplored question
is what portion of the data is needed for accurate forecasting. There is a
tension, where it is desirable to use less data so that the forecast can be
made earlier, which is more useful, versus it being desirable to use more data,
so that the forecast can be more accurate. We find that when using active
learning it is even more important to generate forecasts earlier so as to make
them more useful and not waste annotation effort. We investigate the difference
in forecasting difficulty when using accuracy and F-measure as the text
classification system performance metrics and we find that F-measure is more
difficult to forecast. We conduct experiments on seven text classification
datasets in different semantic domains with different characteristics and with
three different base machine learning algorithms. We find that forecasting is
easiest for decision tree learning, moderate for Support Vector Machines, and
most difficult for neural networks.
Related papers
- Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Robust Machine Learning by Transforming and Augmenting Imperfect
Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning.
We first discuss how to prevent ML from codifying prior human discrimination measured in the training data.
We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Confidence Estimation for Object Detection in Document Images [1.9938405188113029]
We propose four estimators to estimate the confidence of object detection predictions.
The first two are based on Monte Carlo dropout, the third one on descriptive statistics and the last one on the detector posterior probabilities.
In the active learning framework, the three first estimators show a significant improvement in performance for the detection of document physical pages and text lines.
arXiv Detail & Related papers (2022-08-29T06:47:18Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Post-Estimation Smoothing: A Simple Baseline for Learning with Side
Information [102.18616819054368]
We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction.
Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks.
Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice.
arXiv Detail & Related papers (2020-03-12T18:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.