Label-shift robust federated feature screening for high-dimensional classification
- URL: http://arxiv.org/abs/2506.00379v1
- Date: Sat, 31 May 2025 04:14:49 GMT
- Title: Label-shift robust federated feature screening for high-dimensional classification
- Authors: Qi Qin, Erbo Li, Xingxiang Li, Yifan Sun, Wu Wang, Chen Xu,
- Abstract summary: This paper introduces a general framework that unifies existing screening methods and proposes a novel utility, label-shift robust federated feature screening (LR-FFS)<n>Building upon this framework, LR-FFS leverages conditional distribution functions and expectations to address label shift without adding computational burdens.<n> Experimental results and theoretical analyses demonstrate LR-FFS's superior performance across diverse client environments.
- Score: 14.252760098879186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed and federated learning are important tools for high-dimensional classification of large datasets. To reduce computational costs and overcome the curse of dimensionality, feature screening plays a pivotal role in eliminating irrelevant features during data preprocessing. However, data heterogeneity, particularly label shifting across different clients, presents significant challenges for feature screening. This paper introduces a general framework that unifies existing screening methods and proposes a novel utility, label-shift robust federated feature screening (LR-FFS), along with its federated estimation procedure. The framework facilitates a uniform analysis of methods and systematically characterizes their behaviors under label shift conditions. Building upon this framework, LR-FFS leverages conditional distribution functions and expectations to address label shift without adding computational burdens and remains robust against model misspecification and outliers. Additionally, the federated procedure ensures computational efficiency and privacy protection while maintaining screening effectiveness comparable to centralized processing. We also provide a false discovery rate (FDR) control method for federated feature screening. Experimental results and theoretical analyses demonstrate LR-FFS's superior performance across diverse client environments, including those with varying class distributions, sample sizes, and missing categorical data.
Related papers
- Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling [9.774529150331297]
Stratify is a novel FL framework designed to systematically manage class and feature distributions throughout training.<n>Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels.<n>To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption.
arXiv Detail & Related papers (2025-04-18T04:44:41Z) - Harnessing Mixed Features for Imbalance Data Oversampling: Application to Bank Customers Scoring [5.091061468748012]
We introduce MGS-GRF, an oversampling strategy designed for mixed features.<n>We show that MGS-GRF exhibits two important properties: (i) the coherence i.e. the ability to only generate combinations of categorical features that are already present in the original dataset and (ii) association, i.e. the ability to preserve the dependence between continuous and categorical features.
arXiv Detail & Related papers (2025-03-26T08:53:40Z) - Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data [22.20955211690874]
Spofe is a novel self-supervised machine learning pipeline that captures principled representation to achieve clear interpretability with statistical rigor.<n>Underpinning our approach is a robust theoretical framework that delivers precise error bounds and rigorous false discovery rate (FDR) control.<n>Experiments on diverse real-world datasets demonstrate the effectiveness of Spofe.
arXiv Detail & Related papers (2025-03-23T12:27:42Z) - Noise-Adaptive Conformal Classification with Marginal Coverage [53.74125453366155]
We introduce an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise.<n>We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets.
arXiv Detail & Related papers (2025-01-29T23:55:23Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - FedAnchor: Enhancing Federated Semi-Supervised Learning with Label
Contrastive Loss for Unlabeled Clients [19.3885479917635]
Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices.
We propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server.
Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples.
arXiv Detail & Related papers (2024-02-15T18:48:21Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling.
We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures.
Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Query-Adaptive Predictive Inference with Partial Labels [0.0]
We propose a new methodology to construct predictive sets using only partially labeled data on top of black-box predictive models.
Our experiments highlight the validity of our predictive set construction as well as the attractiveness of a more flexible user-dependent loss framework.
arXiv Detail & Related papers (2022-06-15T01:48:42Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.