Impact of the composition of feature extraction and class sampling in
medicare fraud detection
- URL: http://arxiv.org/abs/2206.01413v1
- Date: Fri, 3 Jun 2022 06:57:08 GMT
- Title: Impact of the composition of feature extraction and class sampling in
medicare fraud detection
- Authors: Akrity Kumari, Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali
Agarwal
- Abstract summary: The Centers for Medicaid and Medicare Services released "Medicare Part D" insurance claims is utilized in this study to develop fraud detection system.
To detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm.
- Score: 3.6016022712620095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With healthcare being critical aspect, health insurance has become an
important scheme in minimizing medical expenses. Following this, the healthcare
industry has seen a significant increase in fraudulent activities owing to
increased insurance, and fraud has become a significant contributor to rising
medical care expenses, although its impact can be mitigated using fraud
detection techniques. To detect fraud, machine learning techniques are used.
The Centers for Medicaid and Medicare Services (CMS) of the United States
federal government released "Medicare Part D" insurance claims is utilized in
this study to develop fraud detection system. Employing machine learning
algorithms on a class-imbalanced and high dimensional medicare dataset is a
challenging task. To compact such challenges, the present work aims to perform
feature extraction following data sampling, afterward applying various
classification algorithms, to get better performance. Feature extraction is a
dimensionality reduction approach that converts attributes into linear or
non-linear combinations of the actual attributes, generating a smaller and more
diversified set of attributes and thus reducing the dimensions. Data sampling
is commonlya used to address the class imbalance either by expanding the
frequency of minority class or reducing the frequency of majority class to
obtain approximately equal numbers of occurrences for both classes. The
proposed approach is evaluated through standard performance metrics. Thus, to
detect fraud efficiently, this study applies autoencoder as a feature
extraction technique, synthetic minority oversampling technique (SMOTE) as a
data sampling technique, and various gradient boosted decision tree-based
classifiers as a classification algorithm. The experimental results show the
combination of autoencoders followed by SMOTE on the LightGBM classifier
achieved best results.
Related papers
- Quality assurance of organs-at-risk delineation in radiotherapy [7.698565355235687]
The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning.
The quality assurance of the automatic segmentation is still an unmet need in clinical practice.
Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy.
arXiv Detail & Related papers (2024-05-20T02:32:46Z) - An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification [2.2940141855172036]
In molecular biology, there has been an explosion of data generated from multi-omics sequencing.
Traditional statistical methods face challenging tasks when dealing with such high dimensional data.
This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features.
arXiv Detail & Related papers (2024-05-16T01:45:55Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Fraud Detection Using Optimized Machine Learning Tools Under Imbalance
Classes [0.304585143845864]
Fraud detection with smart versions of machine learning (ML) tools is essential to assure safety.
We investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost.
For phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance.
arXiv Detail & Related papers (2022-09-04T15:30:23Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Federated Deep AUC Maximization for Heterogeneous Data with a Constant
Communication Complexity [77.78624443410216]
We propose improved FDAM algorithms for detecting heterogeneous chest data.
A result of this paper is that the communication of the proposed algorithm is strongly independent of the number of machines and also independent of the accuracy level.
Experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets and on medical chest Xray images from different organizations.
arXiv Detail & Related papers (2021-02-09T04:05:19Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - On the Importance of Diversity in Re-Sampling for Imbalanced Data and
Rare Events in Mortality Risk Models [0.0]
The Surgical Outcome Risk Tool (SORT) is one of the tools developed to predict mortality risk throughout the entire period for major elective in-patient surgeries in the UK.
In this study, we enhance the original SORT prediction model (SORT) by addressing the class imbalance within the dataset.
Our proposed method investigates the application of diversity-based selection on top of common re-sampling techniques.
arXiv Detail & Related papers (2020-12-15T09:45:35Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.