Related papers: A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models

A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models

URL: http://arxiv.org/abs/2502.03386v1
Date: Wed, 05 Feb 2025 17:20:47 GMT
Title: A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models
Authors: Junliang Du, Shiyu Dou, Bohuan Yang, Jiacheng Hu, Tai An,
Abstract summary: The paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability.<n>The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC.<n>Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments.
Score: 1.6951945839990796
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sample categories. The study introduced marginal probability estimation and weighted loss optimization strategies, combined with regularization constraints and structured reasoning methods, effectively improving the generalization ability and robustness of the model. In the experimental stage, a real credit card fraud detection dataset was selected and compared with models such as logistic regression, support vector machine, random forest and XGBoost. The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC, significantly outperforming traditional classification models, demonstrating its strong decision-making ability and applicability in unbalanced data scenarios. Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments and promote its wide application in practical applications such as financial risk control, medical diagnosis, and intelligent monitoring.

Related papers

Interpretable Credit Default Prediction with Ensemble Learning and SHAP [3.948008559977866]
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms.<n>The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems.<n>The external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value.
arXiv Detail & Related papers (2025-05-27T07:23:22Z)
Addressing Class Imbalance with Probabilistic Graphical Models and Variational Inference [10.457756074328664]
This study proposes a method for imbalanced data classification based on deep probabilistic graphical models (DPGMs) We introduce variational inference optimization probability modeling, which enables the model to adaptively adjust the representation ability of minority classes. We combine the adversarial learning mechanism to generate minority class samples in the latent space so that the model can better characterize the category boundary.
arXiv Detail & Related papers (2025-04-08T07:38:30Z)
Assessing Robustness of Machine Learning Models using Covariate Perturbations [0.6749750044497732]
This paper proposes a comprehensive framework for assessing the robustness of machine learning models. We explore various perturbation strategies to assess robustness and examine their impact on model predictions. We demonstrate the effectiveness of our approach in comparing robustness across models, identifying the instabilities in the model, and enhancing model robustness.
arXiv Detail & Related papers (2024-08-02T14:41:36Z)
Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials. This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z)
Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Graph Embedded Intuitionistic Fuzzy Random Vector Functional Link Neural Network for Class Imbalance Learning [4.069144210024564]
We propose a graph embedded intuitionistic fuzzy RVFL for class imbalance learning (GE-IFRVFL-CIL) model incorporating a weighting mechanism to handle imbalanced datasets. The proposed GE-IFRVFL-CIL model offers a promising solution to address the class imbalance issue, mitigates the detrimental effect of noise and outliers, and preserves the inherent geometrical structures of the dataset.
arXiv Detail & Related papers (2023-07-15T20:45:45Z)
Stochastic Methods for AUC Optimization subject to AUC-based Fairness Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints. We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints. We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z)
On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach [42.81632484264218]
In machine learning models, perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. This paper aims to implement a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks.
arXiv Detail & Related papers (2022-09-28T11:41:38Z)
Bias-inducing geometries: an exactly solvable data model with fairness implications [12.532003449620607]
We introduce an exactly solvable high-dimensional model of data imbalance.<n>We analytically unpack the typical properties of learning models trained in this synthetic framework.<n>We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.