Small-Text: Active Learning for Text Classification in Python
- URL: http://arxiv.org/abs/2107.10314v7
- Date: Sat, 7 Oct 2023 10:34:57 GMT
- Title: Small-Text: Active Learning for Text Classification in Python
- Authors: Christopher Schr\"oder, Lydia M\"uller, Andreas Niekler, Martin
Potthast
- Abstract summary: small-text is an easy-to-use active learning library for Python.
It offers pool-based active learning for single- and multi-label text classification.
- Score: 23.87081733039124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce small-text, an easy-to-use active learning library, which offers
pool-based active learning for single- and multi-label text classification in
Python. It features numerous pre-implemented state-of-the-art query strategies,
including some that leverage the GPU. Standardized interfaces allow the
combination of a variety of classifiers, query strategies, and stopping
criteria, facilitating a quick mix and match, and enabling a rapid and
convenient development of both active learning experiments and applications.
With the objective of making various classifiers and query strategies
accessible for active learning, small-text integrates several well-known
machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face
transformers. The latter integrations are optionally installable extensions, so
GPUs can be used but are not required. Using this new library, we investigate
the performance of the recently published SetFit training paradigm, which we
compare to vanilla transformer fine-tuning, finding that it matches the latter
in classification accuracy while outperforming it in area under the curve. The
library is available under the MIT License at
https://github.com/webis-de/small-text, in version 1.3.0 at the time of
writing.
Related papers
- Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning [53.32576252950481]
Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data.
In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks.
arXiv Detail & Related papers (2023-05-19T07:39:17Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - hyperbox-brain: A Toolbox for Hyperbox-based Machine Learning Algorithms [9.061408029414455]
hyperbox-brain is an open-source Python library implementing the leading hyperbox-based machine learning algorithms.
hyperbox-brain exposes a unified API which closely follows and is compatible with the renowned scikit-learn and numpy toolboxes.
arXiv Detail & Related papers (2022-10-06T06:40:07Z) - IMBENS: Ensemble Class-imbalanced Learning in Python [26.007498723608155]
imbens is an open-source Python toolbox for implementing and deploying ensemble learning algorithms on class-imbalanced data.
imbens is released under the MIT open-source license and can be installed from Python Package Index (PyPI)
arXiv Detail & Related papers (2021-11-24T20:14:20Z) - MRCpy: A Library for Minimax Risk Classifiers [10.380882297891272]
Python library, MRCpy, implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach.
MRCpy follows the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries.
arXiv Detail & Related papers (2021-08-04T10:31:20Z) - Solo-learn: A Library of Self-supervised Methods for Visual
Representation Learning [83.02597612195966]
solo-learn is a library of self-supervised methods for visual representation learning.
Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs.
arXiv Detail & Related papers (2021-08-03T22:19:55Z) - Captum: A unified and generic model interpretability library for PyTorch [49.72749684393332]
We introduce a novel, unified, open-source model interpretability library for PyTorch.
The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms.
It can be used for both classification and non-classification models.
arXiv Detail & Related papers (2020-09-16T18:57:57Z) - ktrain: A Low-Code Library for Augmented Machine Learning [0.0]
ktrain is a low-code Python library that makes machine learning more accessible and easier to apply.
It is designed to make sophisticated, state-of-the-art machine learning models simple to build, train, inspect, and apply by both beginners and experienced practitioners.
arXiv Detail & Related papers (2020-04-19T14:18:20Z) - fastai: A Layered API for Deep Learning [1.7223564681760164]
fastai is a deep learning library which provides practitioners with high-level components.
It provides researchers with low-level components that can be mixed and matched to build new approaches.
arXiv Detail & Related papers (2020-02-11T21:16:48Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.