Evaluating categorical encoding methods on a real credit card fraud
detection database
- URL: http://arxiv.org/abs/2112.12024v1
- Date: Wed, 22 Dec 2021 16:48:46 GMT
- Title: Evaluating categorical encoding methods on a real credit card fraud
detection database
- Authors: Fran\c{c}ois de la Bourdonnaye and Fabrice Daniel
- Abstract summary: We describe several well-known categorical encoding methods that are based on target statistics and weight of evidence.
We train the encoded databases using state-of-the-art gradient boosting methods and evaluate their performances.
The contribution of this work is twofold: (1) we compare many state-of-the-art "lite" categorical encoding methods on a large scale database and (2) we use a real credit card fraud detection database.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Correctly dealing with categorical data in a supervised learning context is
still a major issue. Furthermore, though some machine learning methods embody
builtin methods to deal with categorical features, it is unclear whether they
bring some improvements and how do they compare with usual categorical encoding
methods. In this paper, we describe several well-known categorical encoding
methods that are based on target statistics and weight of evidence. We apply
them on a large and real credit card fraud detection database. Then, we train
the encoded databases using state-of-the-art gradient boosting methods and
evaluate their performances. We show that categorical encoding methods
generally bring substantial improvements with respect to the absence of
encoding. The contribution of this work is twofold: (1) we compare many
state-of-the-art "lite" categorical encoding methods on a large scale database
and (2) we use a real credit card fraud detection database.
Related papers
- Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery [65.16724941038052]
Category-aware Prototype Generation (CPG) and Discrimi Category 5.3% (DCE) are proposed.
CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes.
DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes.
arXiv Detail & Related papers (2024-10-24T23:51:40Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing [87.48628403354351]
certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions.
Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty.
We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components.
arXiv Detail & Related papers (2024-02-13T11:59:43Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Evaluating resampling methods on a real-life highly imbalanced online
credit card payments dataset [0.0]
This paper evaluates numerous state-of-the-art resampling methods on a large real-life online credit card payments dataset.
We show they are inefficient because methods are intractable or because metrics do not exhibit substantial improvements.
arXiv Detail & Related papers (2022-06-27T09:57:08Z) - A Pixel-based Encryption Method for Privacy-Preserving Deep Learning
Models [5.749044590090683]
We propose an efficient pixel-based perceptual encryption method.
The method provides a necessary level of security while preserving the intrinsic properties of the original image.
Thereby, can enable deep learning (DL) applications in the encryption domain.
arXiv Detail & Related papers (2022-03-31T03:42:11Z) - Fairness Implications of Encoding Protected Categorical Attributes [26.7015058286397]
We compare the accuracy and fairness implications of two well-known encoding methods: emphone-hot encoding and emphtarget encoding.
First type, textitirreducible bias, is due to direct group category discrimination, and the second type, textitreducible bias, is due to the large variance in statistically underrepresented groups.
We consider the problem of intersectional unfairness that may arise when machine learning best practices improve performance measures by encoding several categorical attributes into a high-cardinality feature.
arXiv Detail & Related papers (2022-01-27T07:39:26Z) - Regularized target encoding outperforms traditional methods in
supervised machine learning with high cardinality features [1.1709030738577393]
We study techniques that yield numeric representations of categorical variables.
We compare different encoding strategies together with five machine learning algorithms.
Regularized versions of target encoding consistently provided the best results.
arXiv Detail & Related papers (2021-04-01T17:21:42Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing [1.8899300124593648]
This paper investigates the robustness of hashing methods based on variational autoencoders to the lack of supervision.
We propose a novel supervision method in which the model uses its label distribution predictions to implement the pairwise objective.
Our experiments show that both methods can significantly increase the hash codes' quality.
arXiv Detail & Related papers (2020-07-17T07:47:10Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.