Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies
- URL: http://arxiv.org/abs/2209.14532v1
- Date: Thu, 29 Sep 2022 03:36:56 GMT
- Title: Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies
- Authors: Jun Lu, Joerg Osterrieder
- Abstract summary: We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
- Score: 4.913248451323163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a probabilistic model for computing an
interpolative decomposition (ID) in which each column of the observed matrix
has its own priority or importance, so that the end result of the decomposition
finds a set of features that are representative of the entire set of features,
and the selected features also have higher priority than others. This approach
is commonly used for low-rank approximation, feature selection, and extracting
hidden patterns in data, where the matrix factors are latent variables
associated with each data dimension. Gibbs sampling for Bayesian inference is
applied to carry out the optimization. We evaluate the proposed models on
real-world datasets, including ten Chinese A-share stocks, and demonstrate that
the proposed Bayesian ID algorithm with intervention (IID) produces comparable
reconstructive errors to existing Bayesian ID algorithms while selecting
features with higher scores or priority.
Related papers
- Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest.
This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z) - Regression with Label Differential Privacy [64.21020761920322]
We derive a label DP randomization mechanism that is optimal under a given regression loss function.
We prove that the optimal mechanism takes the form of a "randomized response on bins"
arXiv Detail & Related papers (2022-12-12T17:41:32Z) - Comparative Study of Inference Methods for Interpolative Decomposition [4.913248451323163]
We propose a probabilistic model with automatic relevance determination (ARD) for learning interpolative decomposition (ID)
We evaluate the model on a variety of real-world datasets including CCLE $EC50$, CCLE $IC50$, Gene Body Methylation, and Promoter Methylation datasets with different sizes, and dimensions.
arXiv Detail & Related papers (2022-06-29T11:37:05Z) - Bayesian Low-Rank Interpolative Decomposition for Complex Datasets [4.913248451323163]
We introduce a probabilistic model for learning interpolative decomposition (ID), which is commonly used for feature selection, low-rank approximation, and identifying hidden patterns in data.
We evaluate the model on a variety of real-world datasets including CCLE EC50, CCLE IC50, CTRP EC50,and MovieLens 100K datasets with different sizes, and dimensions.
arXiv Detail & Related papers (2022-05-30T03:06:48Z) - Towards Deterministic Diverse Subset Sampling [14.236193187116049]
In this paper, we discuss a greedy deterministic adaptation of k-DPP.
We demonstrate the usefulness of the model on an image search task.
arXiv Detail & Related papers (2021-05-28T16:05:58Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Information-theoretic Feature Selection via Tensor Decomposition and
Submodularity [38.05393186002834]
We introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features.
By indirectly aiming to predict the latent variable of the naive Bayes model instead of the original target variable, it is possible to formulate the feature selection problem as of a monotone submodular function subject to a cardinality constraint.
arXiv Detail & Related papers (2020-10-30T10:36:46Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.