Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling
Methods
- URL: http://arxiv.org/abs/2004.01764v1
- Date: Fri, 3 Apr 2020 20:38:22 GMT
- Title: Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling
Methods
- Authors: Kathleen Kerwin and Nathaniel D. Bastian
- Abstract summary: This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms.
Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested.
- Score: 2.741266294612776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study uses stacked generalization, which is a two-step process of
combining machine learning methods, called meta or super learners, for
improving the performance of algorithms in step one (by minimizing the error
rate of each individual algorithm to reduce its bias in the learning set) and
then in step two inputting the results into the meta learner with its stacked
blended output (demonstrating improved performance with the weakest algorithms
learning better). The method is essentially an enhanced cross-validation
strategy. Although the process uses great computational resources, the
resulting performance metrics on resampled fraud data show that increased
system cost can be justified. A fundamental key to fraud data is that it is
inherently not systematic and, as of yet, the optimal resampling methodology
has not been identified. Building a test harness that accounts for all
permutations of algorithm sample set pairs demonstrates that the complex,
intrinsic data structures are all thoroughly tested. Using a comparative
analysis on fraud data that applies stacked generalizations provides useful
insight needed to find the optimal mathematical formula to be used for
imbalanced fraud data sets.
Related papers
- A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems.
The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z) - Reverse That Number! Decoding Order Matters in Arithmetic Learning [49.5504492920404]
Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit.
Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
arXiv Detail & Related papers (2024-03-09T09:04:53Z) - Benchmark of Data Preprocessing Methods for Imbalanced Classification [0.0]
Severe class imbalance is one of the main conditions that make machine learning in cybersecurity difficult.
This paper presents a benchmark of 16 preprocessing methods on six cybersecurity datasets together with 17 public imbalanced datasets from other domains.
arXiv Detail & Related papers (2023-03-06T13:12:43Z) - Algorithms that Approximate Data Removal: New Results and Limitations [2.6905021039717987]
We study the problem of deleting user data from machine learning models trained using empirical risk minimization.
We develop an online unlearning algorithm that is both computationally and memory efficient.
arXiv Detail & Related papers (2022-09-25T17:20:33Z) - Interpolation-based Contrastive Learning for Few-Label Semi-Supervised
Learning [43.51182049644767]
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels.
Regularization-based methods which force the perturbed samples to have similar predictions with the original ones have attracted much attention.
We propose a novel contrastive loss to guide the embedding of the learned network to change linearly between samples.
arXiv Detail & Related papers (2022-02-24T06:00:05Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Dictionary and prior learning with unrolled algorithms for unsupervised
inverse problems [12.54744464424354]
We study Dictionary and Prior learning from degraded measurements as a bi-level problem.
We take advantage of unrolled algorithms to solve approximate formulations of Synthesis and Analysis.
arXiv Detail & Related papers (2021-06-11T12:21:26Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.