Outlier Detection as Instance Selection Method for Feature Selection in
Time Series Classification
- URL: http://arxiv.org/abs/2111.09127v1
- Date: Tue, 16 Nov 2021 14:44:33 GMT
- Title: Outlier Detection as Instance Selection Method for Feature Selection in
Time Series Classification
- Authors: David Cemernek
- Abstract summary: Filter instances provided to feature selection methods for rare instances.
For some data sets, the resulting increase in performance was only a few percent.
For other datasets, we were able to achieve increases in performance of up to 16 percent.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to allow machine learning algorithms to extract knowledge from raw
data, these data must first be cleaned, transformed, and put into
machine-appropriate form. These often very time-consuming phase is referred to
as preprocessing. An important step in the preprocessing phase is feature
selection, which aims at better performance of prediction models by reducing
the amount of features of a data set. Within these datasets, instances of
different events are often imbalanced, which means that certain normal events
are over-represented while other rare events are very limited. Typically, these
rare events are of special interest since they have more discriminative power
than normal events. The aim of this work was to filter instances provided to
feature selection methods for these rare instances, and thus positively
influence the feature selection process. In the course of this work, we were
able to show that this filtering has a positive effect on the performance of
classification models and that outlier detection methods are suitable for this
filtering. For some data sets, the resulting increase in performance was only a
few percent, but for other datasets, we were able to achieve increases in
performance of up to 16 percent. This work should lead to the improvement of
the predictive models and the better interpretability of feature selection in
the course of the preprocessing phase. In the spirit of open science and to
increase transparency within our research field, we have made all our source
code and the results of our experiments available in a publicly available
repository.
Related papers
- LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in Manufacturing [1.3980986259786223]
Rare events are occurrences that take place with a significantly lower frequency than more common regular events.
In manufacturing, predicting such events is particularly important, as they lead to unplanned downtime, shortening equipment lifespan, and high energy consumption.
This paper evaluates the role of data enrichment techniques combined with supervised machine-learning techniques for rare event detection and prediction.
arXiv Detail & Related papers (2024-07-01T00:05:56Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Tradeoffs in Resampling and Filtering for Imbalanced Classification [2.3605348648054454]
We show that different methods of selecting training data bring tradeoffs in effectiveness and efficiency.
We also see that in highly imbalanced cases, filtering test data using first-pass retrieval models is as important for model performance as selecting training data.
arXiv Detail & Related papers (2022-08-31T21:40:47Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Efficient Analysis of COVID-19 Clinical Data using Machine Learning
Models [0.0]
Huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends.
Applying machine learning based algorithms to this big data is a natural approach to take to this aim.
We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases.
arXiv Detail & Related papers (2021-10-18T20:06:01Z) - A Feature Selection Method for Multi-Dimension Time-Series Data [2.055949720959582]
Time-series data in application areas such as motion capture and activity recognition is often multi-dimension.
There is a lot of redundancy in these data streams and good classification accuracy will often be achievable with a small number of features.
We present a method for feature subset selection on multidimensional time-series data based on mutual information.
arXiv Detail & Related papers (2021-04-22T14:49:00Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.