Transfer Learning in Information Criteria-based Feature Selection
- URL: http://arxiv.org/abs/2107.02847v1
- Date: Tue, 6 Jul 2021 19:12:15 GMT
- Title: Transfer Learning in Information Criteria-based Feature Selection
- Authors: Shaohan Chen, Nikolaos V. Sahinidis and Chuanhou Gao
- Abstract summary: We show that a procedure that combines transfer learning with Mallows' Cp (TLCp) outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability.
We also show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the effectiveness of transfer learning based on
Mallows' Cp. We propose a procedure that combines transfer learning with
Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp
criterion in terms of accuracy and stability. Our theoretical results indicate
that, for any sample size in the target domain, the proposed TLCp estimator
performs better than the Cp estimator by the mean squared error (MSE) metric in
the case of orthogonal predictors, provided that i) the dissimilarity between
the tasks from source domain and target domain is small, and ii) the procedure
parameters (complexity penalties) are tuned according to certain explicit
rules. Moreover, we show that our transfer learning framework can be extended
to other feature selection criteria, such as the Bayesian information
criterion. By analyzing the solution of the orthogonalized Cp, we identify an
estimator that asymptotically approximates the solution of the Cp criterion in
the case of non-orthogonal predictors. Similar results are obtained for the
non-orthogonal TLCp. Finally, simulation studies and applications with real
data demonstrate the usefulness of the TLCp scheme.
Related papers
- Central Limit Theorems for Transition Probabilities of Controlled Markov Chains [14.351243505824886]
We develop a central limit theorem (CLT) for the non-parametric estimator of the transition matrices in controlled Markov chains.<n>We derive CLTs for the value, Q-, and advantage functions of any stationary policy, including the optimal policy recovered from the estimated model.<n>These results provide new statistical tools for offline policy evaluation and optimal policy recovery, and enable hypothesis tests for transition probabilities.
arXiv Detail & Related papers (2025-08-02T23:33:57Z) - Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions [3.9650359172757743]
A clustered adaptive intervention (cAI) is a sequence of decision rules that guides practitioners on how best to tailor cluster-level intervention to improve outcomes.<n>We introduce a clustered Q-learning framework with the M-out-of-N Cluster Bootstrap to evaluate whether a set of candidate tailoring variables may be useful in defining an optimal cAI.
arXiv Detail & Related papers (2025-05-01T19:24:39Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - Bounded P-values in Parametric Programming-based Selective Inference [23.35466397627952]
We introduce a procedure to reduce the computational cost while guaranteeing the desired precision, by proposing a method to compute the lower and upper bounds of p-values.
We demonstrate the effectiveness of the proposed method in hypothesis testing problems for feature selection in linear models and attention region identification in deep neural networks.
arXiv Detail & Related papers (2023-07-21T04:55:03Z) - Adaptive sparseness for correntropy-based robust regression via
automatic relevance determination [17.933460891374498]
We integrate the maximum correntropy criterion (MCC) based robust regression algorithm with the automatic relevance determination (ARD) technique in a Bayesian framework.
We use an inherent noise assumption from the MCC to derive an explicit likelihood function, and realize the maximum a posteriori (MAP) estimation with the ARD prior.
MCC-ARD achieves superior prediction performance and feature selection capability than L1-regularized MCC, as demonstrated by a noisy and high-dimensional simulation study.
arXiv Detail & Related papers (2023-01-31T20:23:32Z) - Adaptive Dimension Reduction and Variational Inference for Transductive
Few-Shot Classification [2.922007656878633]
We propose a new clustering method based on Variational Bayesian inference, further improved by Adaptive Dimension Reduction.
Our proposed method significantly improves accuracy in the realistic unbalanced transductive setting on various Few-Shot benchmarks.
arXiv Detail & Related papers (2022-09-18T10:29:02Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Unsupervised learning of disentangled representations in deep restricted
kernel machines with orthogonality constraints [15.296955630621566]
Constr-DRKM is a deep kernel method for the unsupervised learning of disentangled data representations.
We quantitatively evaluate the proposed method's effectiveness in disentangled feature learning.
arXiv Detail & Related papers (2020-11-25T11:40:10Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Re-Assessing the "Classify and Count" Quantification Method [88.60021378715636]
"Classify and Count" (CC) is often a biased estimator.
Previous works have failed to use properly optimised versions of CC.
We argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy.
arXiv Detail & Related papers (2020-11-04T21:47:39Z) - Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime.
We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.