A novel feature selection framework for incomplete data
- URL: http://arxiv.org/abs/2312.04171v1
- Date: Thu, 7 Dec 2023 09:45:14 GMT
- Title: A novel feature selection framework for incomplete data
- Authors: Cong Guo
- Abstract summary: Existing methods complete the incomplete data and then conduct feature selection based on the imputed data.
Since imputation and feature selection are entirely independent steps, the importance of features cannot be considered during imputation.
We propose a novel incomplete data feature selection framework that considers feature importance.
- Score: 0.904776731152113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection on incomplete datasets is an exceptionally challenging
task. Existing methods address this challenge by first employing imputation
methods to complete the incomplete data and then conducting feature selection
based on the imputed data. Since imputation and feature selection are entirely
independent steps, the importance of features cannot be considered during
imputation. However, in real-world scenarios or datasets, different features
have varying degrees of importance. To address this, we propose a novel
incomplete data feature selection framework that considers feature importance.
The framework mainly consists of two alternating iterative stages: the M-stage
and the W-stage. In the M-stage, missing values are imputed based on a given
feature importance vector and multiple initial imputation results. In the
W-stage, an improved reliefF algorithm is employed to learn the feature
importance vector based on the imputed data. Specifically, the feature
importance vector obtained in the current iteration of the W-stage serves as
input for the next iteration of the M-stage. Experimental results on both
artificially generated and real incomplete datasets demonstrate that the
proposed method outperforms other approaches significantly.
Related papers
- An End-to-End Model for Time Series Classification In the Presence of Missing Values [25.129396459385873]
Time series classification with missing data is a prevalent issue in time series analysis.
This study proposes an end-to-end neural network that unifies data imputation and representation learning within a single framework.
arXiv Detail & Related papers (2024-08-11T19:39:12Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A data-science pipeline to enable the Interpretability of Many-Objective
Feature Selection [0.1474723404975345]
Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task.
This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions.
arXiv Detail & Related papers (2023-11-30T17:44:22Z) - Iterative missing value imputation based on feature importance [6.300806721275004]
We have designed an imputation method that considers feature importance.
This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance.
The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
arXiv Detail & Related papers (2023-11-14T09:03:33Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Multi-Objective Evolutionary approach for the Performance Improvement of
Learners using Ensembling Feature selection and Discretization Technique on
Medical data [8.121462458089143]
This paper proposes a novel multi-objective based dimensionality reduction framework.
It incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization.
arXiv Detail & Related papers (2020-04-16T06:32:15Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.