Related papers: Outlier Detection as Instance Selection Method for Feature Selection in Time Series Classification

Outlier Detection as Instance Selection Method for Feature Selection in Time Series Classification

URL: http://arxiv.org/abs/2111.09127v1
Date: Tue, 16 Nov 2021 14:44:33 GMT
Title: Outlier Detection as Instance Selection Method for Feature Selection in Time Series Classification
Authors: David Cemernek
Abstract summary: Filter instances provided to feature selection methods for rare instances. For some data sets, the resulting increase in performance was only a few percent. For other datasets, we were able to achieve increases in performance of up to 16 percent.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In order to allow machine learning algorithms to extract knowledge from raw data, these data must first be cleaned, transformed, and put into machine-appropriate form. These often very time-consuming phase is referred to as preprocessing. An important step in the preprocessing phase is feature selection, which aims at better performance of prediction models by reducing the amount of features of a data set. Within these datasets, instances of different events are often imbalanced, which means that certain normal events are over-represented while other rare events are very limited. Typically, these rare events are of special interest since they have more discriminative power than normal events. The aim of this work was to filter instances provided to feature selection methods for these rare instances, and thus positively influence the feature selection process. In the course of this work, we were able to show that this filtering has a positive effect on the performance of classification models and that outlier detection methods are suitable for this filtering. For some data sets, the resulting increase in performance was only a few percent, but for other datasets, we were able to achieve increases in performance of up to 16 percent. This work should lead to the improvement of the predictive models and the better interpretability of feature selection in the course of the preprocessing phase. In the spirit of open science and to increase transparency within our research field, we have made all our source code and the results of our experiments available in a publicly available repository.

Related papers

CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning [0.0]
Data preprocessing is a critical yet frequently neglected aspect of machine learning. CleanSurvival is a reinforcement-learning-based solution for optimizing preprocessing pipelines. It can handle continuous and categorical variables, using Q-learning to select which combination of data imputation, outlier detection and feature extraction techniques achieves optimal performance.
arXiv Detail & Related papers (2025-02-06T10:33:37Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in Manufacturing [1.3980986259786223]
Rare events are occurrences that take place with a significantly lower frequency than more common regular events. In manufacturing, predicting such events is particularly important, as they lead to unplanned downtime, shortening equipment lifespan, and high energy consumption. This paper evaluates the role of data enrichment techniques combined with supervised machine-learning techniques for rare event detection and prediction.
arXiv Detail & Related papers (2024-07-01T00:05:56Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Tradeoffs in Resampling and Filtering for Imbalanced Classification [2.3605348648054454]
We show that different methods of selecting training data bring tradeoffs in effectiveness and efficiency. We also see that in highly imbalanced cases, filtering test data using first-pass retrieval models is as important for model performance as selecting training data.
arXiv Detail & Related papers (2022-08-31T21:40:47Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Efficient Analysis of COVID-19 Clinical Data using Machine Learning Models [0.0]
Huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends. Applying machine learning based algorithms to this big data is a natural approach to take to this aim. We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases.
arXiv Detail & Related papers (2021-10-18T20:06:01Z)
A Feature Selection Method for Multi-Dimension Time-Series Data [2.055949720959582]
Time-series data in application areas such as motion capture and activity recognition is often multi-dimension. There is a lot of redundancy in these data streams and good classification accuracy will often be achievable with a small number of features. We present a method for feature subset selection on multidimensional time-series data based on mutual information.
arXiv Detail & Related papers (2021-04-22T14:49:00Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.