Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection
- URL: http://arxiv.org/abs/2201.13194v3
- Date: Mon, 3 Apr 2023 04:33:04 GMT
- Title: Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection
- Authors: Peican Zhu, Xin Hou, Keke Tang, Zhen Wang, Feiping Nie
- Abstract summary: We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
- Score: 66.84571085643928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with the flourish of the information age, massive amounts of data are
generated day by day. Due to the large-scale and high-dimensional
characteristics of these data, it is often difficult to achieve better
decision-making in practical applications. Therefore, an efficient big data
analytics method is urgently needed. For feature engineering, feature selection
seems to be an important research content in which is anticipated to select
"excellent" features from candidate ones. Different functions can be realized
through feature selection, such as dimensionality reduction, model effect
improvement, and model performance improvement. In many classification tasks,
researchers found that data seem to be usually close to each other if they are
from the same class; thus, local compactness is of great importance for the
evaluation of a feature. In this manuscript, we propose a fast unsupervised
feature selection method, named as, Compactness Score (CSUFS), to select
desired features. To demonstrate the efficiency and accuracy, several data sets
are chosen with extensive experiments being performed. Later, the effectiveness
and superiority of our method are revealed through addressing clustering tasks.
Here, the performance is indicated by several well-known evaluation metrics,
while the efficiency is reflected by the corresponding running time. As
revealed by the simulation results, our proposed algorithm seems to be more
accurate and efficient compared with existing algorithms.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Fast Classification with Sequential Feature Selection in Test Phase [1.1470070927586016]
This paper introduces a novel approach to active feature acquisition for classification.
It is the task of sequentially selecting the most informative subset of features to achieve optimal prediction performance.
The proposed approach involves a new lazy model that is significantly faster and more efficient compared to existing methods.
arXiv Detail & Related papers (2023-06-25T21:31:46Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - IVFS: Simple and Efficient Feature Selection for High Dimensional
Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation.
The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.