Probabilistic Value Selection for Space Efficient Model
- URL: http://arxiv.org/abs/2007.04641v1
- Date: Thu, 9 Jul 2020 08:45:13 GMT
- Title: Probabilistic Value Selection for Space Efficient Model
- Authors: Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, and Wen-Chih Peng
- Abstract summary: Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS.
Experiment results show that value selection can achieve the balance between accuracy and model size reduction.
- Score: 10.109875612945658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An alternative to current mainstream preprocessing methods is proposed: Value
Selection (VS). Unlike the existing methods such as feature selection that
removes features and instance selection that eliminates instances, value
selection eliminates the values (with respect to each feature) in the dataset
with two purposes: reducing the model size and preserving its accuracy. Two
probabilistic methods based on information theory's metric are proposed: PVS
and P + VS. Extensive experiments on the benchmark datasets with various sizes
are elaborated. Those results are compared with the existing preprocessing
methods such as feature selection, feature transformation, and instance
selection methods. Experiment results show that value selection can achieve the
balance between accuracy and model size reduction.
Related papers
- Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Finding Optimal Diverse Feature Sets with Alternative Feature Selection [0.0]
We introduce alternative feature selection and formalize it as an optimization problem.
In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives.
We show that a constant-factor approximation exists under certain conditions and propose corresponding search methods.
arXiv Detail & Related papers (2023-07-21T14:23:41Z) - A model-free feature selection technique of feature screening and random
forest based recursive feature elimination [0.0]
We propose a model-free feature selection method for ultra-high dimensional data with mass features.
We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
arXiv Detail & Related papers (2023-02-15T03:39:16Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Few-shot Learning for Unsupervised Feature Selection [59.75321498170363]
We propose a few-shot learning method for unsupervised feature selection.
The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances.
We experimentally demonstrate that the proposed method outperforms existing feature selection methods.
arXiv Detail & Related papers (2021-07-02T03:52:51Z) - Feature Selection Methods for Cost-Constrained Classification in Random
Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model.
Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees.
We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z) - Lookahead and Hybrid Sample Allocation Procedures for Multiple Attribute
Selection Decisions [0.9137554315375922]
This paper considers settings in which each measurement yields one sample of one attribute for one alternative.
When given a fixed number of samples to collect, the decision-maker must determine which samples to obtain, make the measurements, update prior beliefs about the attribute magnitudes, and then select an alternative.
arXiv Detail & Related papers (2020-07-31T15:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.