Related papers: Comparing Methods for Bias Mitigation in Graph Neural Networks

Comparing Methods for Bias Mitigation in Graph Neural Networks

URL: http://arxiv.org/abs/2503.22569v1
Date: Fri, 28 Mar 2025 16:18:48 GMT
Title: Comparing Methods for Bias Mitigation in Graph Neural Networks
Authors: Barbara Hoffmann, Ruben Mayer,
Abstract summary: This paper examines the critical role of Graph Neural Networks (GNNs) in data preparation for generative artificial intelligence (GenAI) systems.<n>We present a comparative analysis of three distinct methods for bias mitigation: data sparsification, feature modification, and synthetic data augmentation.
Score: 5.256237513030105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper examines the critical role of Graph Neural Networks (GNNs) in data preparation for generative artificial intelligence (GenAI) systems, with a particular focus on addressing and mitigating biases. We present a comparative analysis of three distinct methods for bias mitigation: data sparsification, feature modification, and synthetic data augmentation. Through experimental analysis using the german credit dataset, we evaluate these approaches using multiple fairness metrics, including statistical parity, equality of opportunity, and false positive rates. Our research demonstrates that while all methods improve fairness metrics compared to the original dataset, stratified sampling and synthetic data augmentation using GraphSAGE prove particularly effective in balancing demographic representation while maintaining model performance. The results provide practical insights for developing more equitable AI systems while maintaining model performance.

Related papers

Comparison of generalised additive models and neural networks in applications: A systematic review [1.1775939485654978]
Generalised Additive Models (GAMs) and neural networks are state-of-the-art statistical models that interpretability retainability.<n>We conduct a systematic review of papers that performed empirical comparisons of GAMs and neural networks.<n>Across datasets, no consistent evidence of superiority was found for either GAMs or neural networks.<n>This review highlights that GAMs and neural networks should be viewed as complementary competitors.
arXiv Detail & Related papers (2025-10-28T16:28:42Z)
Improving Data and Parameter Efficiency of Neural Language Models Using Representation Analysis [0.0]
This thesis addresses challenges related to data and parameter efficiency in neural language models.<n>The first part examines the properties and dynamics of language representations within neural models, emphasizing their significance in enhancing robustness and generalization.<n>The second part focuses on methods to significantly enhance data and parameter efficiency by integrating active learning strategies with parameter-efficient fine-tuning.<n>The third part explores weak supervision techniques enhanced by in-context learning to effectively utilize unlabeled data.
arXiv Detail & Related papers (2025-07-16T07:58:20Z)
Model-agnostic Mitigation Strategies of Data Imbalance for Regression [0.0]
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability.<n>We present advanced mitigation techniques, which build upon and improve existing sampling methods.<n>We demonstrate that constructing an ensemble of models -- one trained with imbalance mitigation and another without -- can significantly reduce these negative effects.
arXiv Detail & Related papers (2025-06-02T09:46:08Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data [6.318463500874778]
We propose a prototype-guided diffusion model to generate high-fidelity synthetic pathology data at scale. Our approach ensures biologically and diagnostically meaningful variations in the generated data. We demonstrate that self-supervised features trained on our synthetic dataset achieve competitive performance despite using 60x-760x less data than models trained on large real-world datasets.
arXiv Detail & Related papers (2025-04-15T21:17:39Z)
Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data [0.8246494848934447]
This study presents a hierarchical mining framework for high-dimensional imbalanced data.<n>By constructing a structured graph representation of the dataset and integrating graph neural network embeddings, the proposed method effectively captures global interdependencies among samples.<n> Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach.
arXiv Detail & Related papers (2025-02-06T06:26:41Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation [0.6707149143800017]
This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs)
arXiv Detail & Related papers (2024-04-22T01:16:11Z)
The Common Stability Mechanism behind most Self-Supervised Learning Approaches [64.40701218561921]
We provide a framework to explain the stability mechanism of different self-supervised learning techniques. We discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO. We formulate different hypotheses and test them using the Imagenet100 dataset.
arXiv Detail & Related papers (2024-02-22T20:36:24Z)
TRIAGE: Characterizing and auditing training data for improved regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z)
Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
Fair Node Representation Learning via Adaptive Data Augmentation [9.492903649862761]
This work theoretically explains the sources of bias in node representations obtained via Graph Neural Networks (GNNs) Building upon the analysis, fairness-aware data augmentation frameworks are developed to reduce the intrinsic bias. Our analysis and proposed schemes can be readily employed to enhance the fairness of various GNN-based learning mechanisms.
arXiv Detail & Related papers (2022-01-21T05:49:15Z)
Transitioning from Real to Synthetic data: Quantifying the bias in model [1.6134566438137665]
This study aims to establish a trade-off between bias and fairness in the models trained using synthetic data. We demonstrate there exist a varying levels of bias impact on models trained using synthetic data.
arXiv Detail & Related papers (2021-05-10T06:57:14Z)
Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution. We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator. Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.