Feature Selection by a Mechanism Design
- URL: http://arxiv.org/abs/2110.02419v1
- Date: Tue, 5 Oct 2021 23:53:14 GMT
- Title: Feature Selection by a Mechanism Design
- Authors: Xingwei Hu
- Abstract summary: We study the selection problem where the players are the candidates and the payoff function is a performance measurement.
In theory, an irrelevant feature is equivalent to a dummy player in the game, which contributes nothing to all modeling situations.
In our mechanism design, the end goal perfectly matches the expected model performance with the expected sum of individual marginal effects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In constructing an econometric or statistical model, we pick relevant
features or variables from many candidates. A coalitional game is set up to
study the selection problem where the players are the candidates and the payoff
function is a performance measurement in all possible modeling scenarios. Thus,
in theory, an irrelevant feature is equivalent to a dummy player in the game,
which contributes nothing to all modeling situations. The hypothesis test of
zero mean contribution is the rule to decide a feature is irrelevant or not. In
our mechanism design, the end goal perfectly matches the expected model
performance with the expected sum of individual marginal effects. Within a
class of noninformative likelihood among all modeling opportunities, the
matching equation results in a specific valuation for each feature. After
estimating the valuation and its standard deviation, we drop any candidate
feature if its valuation is not significantly different from zero. In the
simulation studies, our new approach significantly outperforms several popular
methods used in practice, and its accuracy is robust to the choice of the
payoff function.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Confidence-Based Model Selection: When to Take Shortcuts for
Subpopulation Shifts [119.22672589020394]
We propose COnfidence-baSed MOdel Selection (CosMoS), where model confidence can effectively guide model selection.
We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift.
arXiv Detail & Related papers (2023-06-19T18:48:15Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Post-Selection Confidence Bounds for Prediction Performance [2.28438857884398]
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks.
We propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set.
arXiv Detail & Related papers (2022-10-24T13:28:43Z) - Predicting is not Understanding: Recognizing and Addressing
Underspecification in Machine Learning [47.651130958272155]
Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy.
We formalize the concept of underspecification and propose a method to identify and partially address it.
arXiv Detail & Related papers (2022-07-06T11:20:40Z) - Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training.
A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy.
The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z) - Quantum-Assisted Feature Selection for Vehicle Price Prediction Modeling [0.0]
We study metrics for encoding the search as a binary model, such as Generalized Mean Information Coefficient and Pearson Correlation Coefficient.
We achieve accuracy scores of 0.9 for finding optimal subsets on synthetic data using a new metric that we define.
Our findings show that by leveraging quantum-assisted routines we find solutions that increase the quality of predictive model output.
arXiv Detail & Related papers (2021-04-08T20:48:44Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Better Model Selection with a new Definition of Feature Importance [8.914907178577476]
Feature importance aims at measuring how crucial each input feature is for model prediction.
In this paper, we propose a new tree-model explanation approach for model selection.
arXiv Detail & Related papers (2020-09-16T14:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.