Sliced-Wasserstein-based Anomaly Detection and Open Dataset for Localized Critical Peak Rebates
- URL: http://arxiv.org/abs/2410.21712v2
- Date: Sun, 01 Dec 2024 03:54:57 GMT
- Title: Sliced-Wasserstein-based Anomaly Detection and Open Dataset for Localized Critical Peak Rebates
- Authors: Julien Pallage, Bertrand Scherrer, Salma Naccache, Christophe Bélanger, Antoine Lesage-Landry,
- Abstract summary: We present a new unsupervised anomaly (outlier) detection (AD) method using the sliced-Wasserstein metric.<n>This filtering technique is conceptually interesting for MLOps pipelines deploying machine learning models in critical sectors.
- Score: 25.452449432754698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present a new unsupervised anomaly (outlier) detection (AD) method using the sliced-Wasserstein metric. This filtering technique is conceptually interesting for MLOps pipelines deploying machine learning models in critical sectors, e.g., energy, as it offers a conservative data selection. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We demonstrate the capabilities of our method on synthetic datasets as well as standard AD datasets and use it in the making of a first benchmark for our open-source localized critical peak rebate dataset.
Related papers
- Sliced-Wasserstein Distance-based Data Selection [0.0]
We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance.
Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors.
We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection.
arXiv Detail & Related papers (2025-04-17T13:07:26Z) - CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property.
We propose a novel dataset ownership verification method.
Our approach introduces a clustering-based backdoor watermark (CBW)
We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z) - Personalized Federated Learning via Active Sampling [50.456464838807115]
This paper proposes a novel method for sequentially identifying similar (or relevant) data generators.
Our method evaluates the relevance of a data generator by evaluating the effect of a gradient step using its local dataset.
We extend this method to non-parametric models by a suitable generalization of the gradient step to update a hypothesis using the local dataset provided by a data generator.
arXiv Detail & Related papers (2024-09-03T17:12:21Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Self-Supervised Learning for User Localization [8.529237718266042]
Machine learning techniques have shown remarkable accuracy in localization tasks.
Their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck.
We propose a pioneering approach that leverages self-supervised pretraining on unlabeled data to boost the performance of supervised learning for user localization based on CSI.
arXiv Detail & Related papers (2024-04-19T21:49:10Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Unlocking the Use of Raw Multispectral Earth Observation Imagery for Onboard Artificial Intelligence [3.3810628880631226]
This work presents a novel methodology to automate the creation of datasets for the detection of target events.
The presented approach first processes the raw data by applying a pipeline consisting of spatial band registration and georeferencing.
It detects the target events by leveraging event-specific state-of-the-art algorithms on the Level-1C products.
We apply the proposed methodology to realize THRawS (Thermal Hotspots in Raw Sentinel-2 data), the first dataset of Sentinel-2 raw data containing warm thermal hotspots.
arXiv Detail & Related papers (2023-05-12T09:54:21Z) - One-Stage Cascade Refinement Networks for Infrared Small Target
Detection [21.28595135499812]
Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics.
We present a new research benchmark for infrared small target detection consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets.
arXiv Detail & Related papers (2022-12-16T13:37:23Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI [19.358073575300004]
We propose a data-centric approach to address the fundamental distributional and semantic properties of dataset with black box models.
We achieve 84.711% test accuracy (ranked #6, Honorable Mention in the Most Innovative) in the Data-Centric AI competition only with the provided dataset.
arXiv Detail & Related papers (2021-12-07T17:22:44Z) - A Competitive Method to VIPriors Object Detection Challenge [13.024811732127615]
This report introduces the technical details of our submission to the VIPriors object detection challenge.
We introduce an effective data augmentation method to address the lack of data problem, which contains bbox-jitter, grid-mask, and mix-up.
We also present a robust region of interest (ROI) extraction method to learn more significant ROI features.
arXiv Detail & Related papers (2021-04-19T05:33:39Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.