Related papers: Machine Learning Pipeline for Pulsar Star Dataset

Machine Learning Pipeline for Pulsar Star Dataset

URL: http://arxiv.org/abs/2005.01208v1
Date: Sun, 3 May 2020 23:35:44 GMT
Title: Machine Learning Pipeline for Pulsar Star Dataset
Authors: Alexander Ylnner Choquenaira Florez, Braulio Valentin Sanchez Vinces, Diana Carolina Roca Arroyo, Josimar Edinson Chire Saire, Patr{\i}cia Batista Franco
Abstract summary: This work brings together some of the most common machine learning (ML) algorithms. The objective is to make a comparison at the level of obtained results from a set of unbalanced data.
Score: 58.720142291102135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work brings together some of the most common machine learning (ML) algorithms, and the objective is to make a comparison at the level of obtained results from a set of unbalanced data. This dataset is composed of almost 17 thousand observations made to astronomical objects to identify pulsars (HTRU2). The methodological proposal based on evaluating the accuracy of these different models on the same database treated with two different strategies for unbalanced data. The results show that in spite of the noise and unbalance of classes present in this type of data, it is possible to apply them on standard ML algorithms and obtain promising accuracy ratios.

Related papers

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data [1.3108652488669732]
We propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, for data classification tasks on highly imbalanced molecular data sets. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. The proposed method is validated using six molecular data sets and compared to other related techniques.
arXiv Detail & Related papers (2024-06-10T17:20:13Z)
Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z)
Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets. The emphasis was on ensemble models constructed using the Error Correction Output Codes framework. Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z)
Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization [3.4809730725241597]
This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments.
arXiv Detail & Related papers (2023-05-29T00:41:32Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators. We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Handling Imbalanced Data: A Case Study for Binary Class Problems [0.0]
The major issues in terms of solving for classification problems are the issues of Imbalanced data. This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms. We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
arXiv Detail & Related papers (2020-10-09T02:04:14Z)
Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.