Statistical Inference for Feature Selection after Optimal Transport-based Domain Adaptation
- URL: http://arxiv.org/abs/2410.15022v1
- Date: Sat, 19 Oct 2024 07:35:23 GMT
- Title: Statistical Inference for Feature Selection after Optimal Transport-based Domain Adaptation
- Authors: Nguyen Thang Loi, Duong Tan Loc, Vo Nguyen Le Duy,
- Abstract summary: Feature Selection (FS) under domain adaptation (DA) is a critical task in machine learning.
We introduce a novel statistical method to statistically test FS reliability under DA, named SFS-DA.
- Score: 7.10052009802944
- License:
- Abstract: Feature Selection (FS) under domain adaptation (DA) is a critical task in machine learning, especially when dealing with limited target data. However, existing methods lack the capability to guarantee the reliability of FS under DA. In this paper, we introduce a novel statistical method to statistically test FS reliability under DA, named SFS-DA (statistical FS-DA). The key strength of SFS-DA lies in its ability to control the false positive rate (FPR) below a pre-specified level $\alpha$ (e.g., 0.05) while maximizing the true positive rate. Compared to the literature on statistical FS, SFS-DA presents a unique challenge in addressing the effect of DA to ensure the validity of the inference on FS results. We overcome this challenge by leveraging the Selective Inference (SI) framework. Specifically, by carefully examining the FS process under DA whose operations can be characterized by linear and quadratic inequalities, we prove that achieving FPR control in SFS-DA is indeed possible. Furthermore, we enhance the true detection rate by introducing a more strategic approach. Experiments conducted on both synthetic and real-world datasets robustly support our theoretical results, showcasing the superior performance of the proposed SFS-DA method.
Related papers
- Statistical Inference for Sequential Feature Selection after Domain Adaptation [7.10052009802944]
We propose a novel method for testing the features selected by SeqFS-DA.
The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $alpha$ (e.g., 0.05)
We provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared.
arXiv Detail & Related papers (2025-01-17T03:14:43Z) - Unveiling the Superior Paradigm: A Comparative Study of Source-Free Domain Adaptation and Unsupervised Domain Adaptation [52.36436121884317]
We show that Source-Free Domain Adaptation (SFDA) generally outperforms Unsupervised Domain Adaptation (UDA) in real-world scenarios.
SFDA offers advantages in time efficiency, storage requirements, targeted learning objectives, reduced risk of negative transfer, and increased robustness against overfitting.
We propose a novel weight estimation method that effectively integrates available source data into multi-SFDA approaches.
arXiv Detail & Related papers (2024-11-24T13:49:29Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Feature Reduction Method Comparison Towards Explainability and
Efficiency in Cybersecurity Intrusion Detection Systems [11.123884574885018]
Intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data.
In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN)
We look at three different FS techniques; RF information gain (RF-IG), correlation selection using the Bat Algorithm (CFSBA), and CFS using the Aquila (CFS-AO)
arXiv Detail & Related papers (2023-03-22T20:09:31Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Federated Semi-Supervised Domain Adaptation via Knowledge Transfer [6.7543356061346485]
This paper proposes an innovative approach to achieve semi-supervised domain adaptation (SSDA) over multiple distributed and confidential datasets.
Federated Semi-Supervised Domain Adaptation (FSSDA) integrates SSDA with federated learning based on strategically designed knowledge distillation techniques.
Extensive experiments are conducted to demonstrate the effectiveness and efficiency of FSSDA design.
arXiv Detail & Related papers (2022-07-21T19:36:10Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.