Statistical Inference for Sequential Feature Selection after Domain Adaptation
- URL: http://arxiv.org/abs/2501.09933v1
- Date: Fri, 17 Jan 2025 03:14:43 GMT
- Title: Statistical Inference for Sequential Feature Selection after Domain Adaptation
- Authors: Duong Tan Loc, Nguyen Thang Loi, Vo Nguyen Le Duy,
- Abstract summary: We propose a novel method for testing the features selected by SeqFS-DA.
The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $alpha$ (e.g., 0.05)
We provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared.
- Score: 7.10052009802944
- License:
- Abstract: In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring knowledge from a related source domain to a target domain, improving generalization performance. Although SeqFS after DA is an important task in machine learning, none of the existing methods can guarantee the reliability of its results. In this paper, we propose a novel method for testing the features selected by SeqFS-DA. The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $\alpha$ (e.g., 0.05). Additionally, a strategic approach is introduced to enhance the statistical power of the test. Furthermore, we provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared. Extensive experiments are conducted on both synthetic and real-world datasets to validate the theoretical results and demonstrate the proposed method's superior performance.
Related papers
- "FRAME: Forward Recursive Adaptive Model Extraction -- A Technique for Advance Feature Selection" [0.0]
This study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME)
FRAME combines Forward Selection and Recursive Feature Elimination to enhance feature selection across diverse datasets.
The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics.
arXiv Detail & Related papers (2025-01-21T08:34:10Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.
First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.
We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Statistical Inference for Feature Selection after Optimal Transport-based Domain Adaptation [7.10052009802944]
Feature Selection (FS) under domain adaptation (DA) is a critical task in machine learning.
We introduce a novel statistical method to statistically test FS reliability under DA, named SFS-DA.
arXiv Detail & Related papers (2024-10-19T07:35:23Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions [7.674715791336311]
We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse function-on-function regression problem.
We show how to extend it to the scalar-on-function framework.
We present an application to brain fMRI data from the AOMIC PIOP1 study.
arXiv Detail & Related papers (2023-03-26T19:41:17Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Sparsity-based Feature Selection for Anomalous Subgroup Discovery [5.960402015658508]
Anomalous pattern detection aims to identify instances where deviation from normalcy is evident, and is widely applicable across domains.
There is a common lack of a principled and scalable feature selection method for efficient discovery.
In this paper, we proposed a sparsity-based automated feature selection framework, which encodes systemic outcome deviations via the sparsity of feature-driven odds ratios.
arXiv Detail & Related papers (2022-01-06T10:56:43Z) - More Powerful Conditional Selective Inference for Generalized Lasso by
Parametric Programming [20.309302270008146]
Conditional selective inference (SI) has been studied intensively as a new statistical inference framework for data-driven hypotheses.
We propose a more powerful and general conditional SI method for a class of problems that can be converted into quadratic parametric programming.
arXiv Detail & Related papers (2021-05-11T10:12:00Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.