QuaPy: A Python-Based Framework for Quantification
- URL: http://arxiv.org/abs/2106.11057v1
- Date: Fri, 18 Jun 2021 13:57:11 GMT
- Title: QuaPy: A Python-Based Framework for Quantification
- Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani
- Abstract summary: QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation)
It is written in Python and can be installed via pip.
- Score: 76.22817970624875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: QuaPy is an open-source framework for performing quantification (a.k.a.
supervised prevalence estimation), written in Python. Quantification is the
task of training quantifiers via supervised learning, where a quantifier is a
predictor that estimates the relative frequencies (a.k.a. prevalence values) of
the classes of interest in a sample of unlabelled data. While quantification
can be trivially performed by applying a standard classifier to each unlabelled
data item and counting how many data items have been assigned to each class, it
has been shown that this "classify and count" method is outperformed by methods
specifically designed for quantification. QuaPy provides implementations of a
number of baseline methods and advanced quantification methods, of routines for
quantification-oriented model selection, of several broadly accepted evaluation
measures, and of robust evaluation protocols routinely used in the field. QuaPy
also makes available datasets commonly used for testing quantifiers, and offers
visualization tools for facilitating the analysis and interpretation of the
results. The software is open-source and publicly available under a BSD-3
licence via https://github.com/HLT-ISTI/QuaPy, and can be installed via pip
(https://pypi.org/project/QuaPy/)
Related papers
- Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - DeeProb-kit: a Python Library for Deep Probabilistic Modelling [0.0]
DeeProb-kit is a unified library written in Python consisting of a collection of deep probabilistic models (DPMs)
It includes efficiently implemented learning techniques, inference routines, statistical algorithms, and provides high-quality fully-documented APIs.
arXiv Detail & Related papers (2022-12-08T17:02:16Z) - Latte: Cross-framework Python Package for Evaluation of Latent-Based
Generative Models [65.51757376525798]
Latte is a Python library for evaluation of latent-based generative models.
Latte is compatible with both PyTorch and/Keras, and provides both functional and modular APIs.
arXiv Detail & Related papers (2021-12-20T16:00:28Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - The Word is Mightier than the Label: Learning without Pointillistic
Labels using Data Programming [11.536162323162099]
Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples.
Hand-labelling vast amounts of data may be tedious, expensive, and error-prone.
arXiv Detail & Related papers (2021-08-24T19:11:28Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Quantifying With Only Positive Training Data [0.5735035463793008]
Quantification is the research field that studies methods for counting the number of data points that belong to each class in an unlabeled sample.
This article closes the gap between Positive and Unlabeled Learning (PUL) and One-class Quantification (OCQ)
We compare our method, Passive Aggressive Threshold (PAT), against PUL methods and show that PAT generally is the fastest and most accurate algorithm.
arXiv Detail & Related papers (2020-04-22T01:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.