Related papers: stream-learn -- open-source Python library for difficult data stream batch analysis

stream-learn -- open-source Python library for difficult data stream batch analysis

URL: http://arxiv.org/abs/2001.11077v1
Date: Wed, 29 Jan 2020 20:15:09 GMT
Title: stream-learn -- open-source Python library for difficult data stream batch analysis
Authors: Pawe{\l} Ksieniewicz, Pawe{\l} Zyblewski
Abstract summary: stream-learn is compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Main component is a stream generator, which allows to produce a synthetic data stream. In addition, estimators adapted for data stream classification have been implemented.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

Related papers

RPS: A Generic Reservoir Patterns Sampler [1.09784964592609]
We introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data. We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets.
arXiv Detail & Related papers (2024-10-31T16:25:21Z)
A Comprehensive Empirical Evaluation on Online Continual Learning [20.39495058720296]
We evaluate methods from the literature that tackle online continual learning. We focus on the class-incremental setting in the context of image classification. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks.
arXiv Detail & Related papers (2023-08-20T17:52:02Z)
Distributive Pre-Training of Generative Modeling Using Matrix-Product States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z)
Revisiting Long-tailed Image Classification: Survey and Benchmarks with New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs) PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z)
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance. We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
Classification and Feature Transformation with Fuzzy Cognitive Maps [0.3299672391663526]
Fuzzy Cognitive Maps (FCMs) are considered a soft computing technique combining elements of fuzzy logic and recurrent neural networks. In this work we propose an FCM based classifier with a fully connected map structure. Weights were learned with a gradient algorithm and logloss or cross-entropy were used as the cost function.
arXiv Detail & Related papers (2021-03-08T22:26:24Z)
Dictionary Learning with Low-rank Coding Coefficients for Tensor Completion [33.068635237011236]
Our model is to learn a data-adaptive dictionary from the given observations. In the completion process, we minimize the low-rankness of each tensor slice containing the coding coefficients.
arXiv Detail & Related papers (2020-09-26T02:43:43Z)
Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.