Related papers: Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

URL: http://arxiv.org/abs/2409.08636v2
Date: Mon, 30 Sep 2024 21:14:27 GMT
Title: Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets
Authors: Lars Böcking, Leopold Müller, Niklas Kühl,
Abstract summary: We introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty. Our approach is evaluated on the 112 University of California riverside benchmark datasets.
Score: 4.2193475197905705
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.

Related papers

Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Analyzing the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance [0.0]
In this paper, an analysis was conducted to determine the relative performance of a suite of nature-inspired algorithms in the feature-selection portion of ensemble algorithms used to predict student performance. It was found that leveraging an ensemble approach using nature-inspired algorithms for feature selection and traditional ML algorithms for classification significantly increased predictive accuracy while also reducing feature set size by up to 65 percent.
arXiv Detail & Related papers (2023-08-15T21:18:52Z)
LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z)
Fair Feature Subset Selection using Multiobjective Genetic Algorithm [0.0]
We present a feature subset selection approach that improves both fairness and accuracy objectives. We use statistical disparity as a fairness metric and F1-Score as a metric for model performance. Our experiments on the most commonly used fairness benchmark datasets show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-04-30T22:51:19Z)
Early Time-Series Classification Algorithms: An Empirical Comparison [59.82930053437851]
Early Time-Series Classification (ETSC) is the task of predicting the class of incoming time-series by observing as few measurements as possible. We evaluate six existing ETSC algorithms on publicly available data, as well as on two newly introduced datasets.
arXiv Detail & Related papers (2022-03-03T10:43:56Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z)
Automatic selection of clustering algorithms using supervised graph embedding [14.853602181549967]
MARCO-GE is a novel meta-learning approach for the automated recommendation of clustering algorithms. It trains a ranking meta-model capable of accurately recommending top-performing algorithms for a new dataset and clustering evaluation measure.
arXiv Detail & Related papers (2020-11-16T19:13:20Z)
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.