Feature Shift Localization Network
- URL: http://arxiv.org/abs/2506.09101v1
- Date: Tue, 10 Jun 2025 15:27:32 GMT
- Title: Feature Shift Localization Network
- Authors: Míriam Barrabés, Daniel Mas Montserrat, Kapal Dev, Alexander G. Ioannidis,
- Abstract summary: We introduce a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner.<n>The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts without the need for re-training.
- Score: 51.33484517421393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature shifts between data sources are present in many applications involving healthcare, biomedical, socioeconomic, financial, survey, and multi-sensor data, among others, where unharmonized heterogeneous data sources, noisy data measurements, or inconsistent processing and standardization pipelines can lead to erroneous features. Localizing shifted features is important to address the underlying cause of the shift and correct or filter the data to avoid degrading downstream analysis. While many techniques can detect distribution shifts, localizing the features originating them is still challenging, with current solutions being either inaccurate or not scalable to large and high-dimensional datasets. In this work, we introduce the Feature Shift Localization Network (FSL-Net), a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner. The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts from previously unseen datasets and shifts without the need for re-training. The code and ready-to-use trained model are available at https://github.com/AI-sandbox/FSL-Net.
Related papers
- FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors [6.676517041445593]
We introduce FlexLoc, which employs conditional neural networks to inject node perspective information to adapt the localization pipeline.
Our evaluations on a multimodal, multiview indoor tracking dataset showcase that FlexLoc improves the localization accuracy by almost 50% in the zero-shot case.
arXiv Detail & Related papers (2024-06-10T21:02:53Z) - Adversarial Learning for Feature Shift Detection and Correction [45.65548560695731]
Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features.
In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
arXiv Detail & Related papers (2023-12-07T18:58:40Z) - Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - DOMINO: Domain-invariant Hyperdimensional Classification for
Multi-Sensor Time Series Data [14.434647668734184]
We propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data.
DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference.
arXiv Detail & Related papers (2023-08-07T04:44:12Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance.
Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem.
NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.
SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z) - Leveraging Multi-domain, Heterogeneous Data using Deep Multitask
Learning for Hate Speech Detection [21.410160004193916]
We propose a Convolution Neural Network based multi-task learning models (MTLs)footnotecode to leverage information from multiple sources.
Empirical analysis performed on three benchmark datasets shows the efficacy of the proposed approach.
arXiv Detail & Related papers (2021-03-23T09:31:01Z) - Anonymizing Sensor Data on the Edge: A Representation Learning and
Transformation Approach [4.920145245773581]
In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation.
We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data.
We show that it can anonymize data in real time on resource-constrained edge devices.
arXiv Detail & Related papers (2020-11-16T22:32:30Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Learning across label confidence distributions using Filtered Transfer
Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets.
We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks.
We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.