Sparse outlier-robust PCA for multi-source data
- URL: http://arxiv.org/abs/2407.16299v1
- Date: Tue, 23 Jul 2024 08:55:03 GMT
- Title: Sparse outlier-robust PCA for multi-source data
- Authors: Patricia Puchhammer, Ines Wilms, Peter Filzmoser,
- Abstract summary: We introduce a novel PCA methodology that simultaneously selects important features as well as local source-specific patterns.
We develop a regularization problem with a penalty that accommodates global-local structured sparsity patterns.
We provide an efficient implementation of our proposal via the Alternating Direction Method of Multiplier.
- Score: 2.3226893628361687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse and outlier-robust Principal Component Analysis (PCA) has been a very active field of research recently. Yet, most existing methods apply PCA to a single dataset whereas multi-source data-i.e. multiple related datasets requiring joint analysis-arise across many scientific areas. We introduce a novel PCA methodology that simultaneously (i) selects important features, (ii) allows for the detection of global sparse patterns across multiple data sources as well as local source-specific patterns, and (iii) is resistant to outliers. To this end, we develop a regularization problem with a penalty that accommodates global-local structured sparsity patterns, and where the ssMRCD estimator is used as plug-in to permit joint outlier-robust analysis across multiple data sources. We provide an efficient implementation of our proposal via the Alternating Direction Method of Multiplier and illustrate its practical advantages in simulation and in applications.
Related papers
- Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection.
We introduce multi-stage prompting modules for multi-dataset 3D detection.
Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Provably Efficient Offline Reinforcement Learning with Perturbed Data
Sources [23.000116974718]
Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task.
In practice, however, data often come from several heterogeneous but related sources.
This work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself.
arXiv Detail & Related papers (2023-06-14T08:53:20Z) - GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using
Macro Data Sources [21.32471030724983]
Individual-level data (microdata) that characterizes a population is essential for studying many real-world problems.
In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data.
arXiv Detail & Related papers (2022-12-08T01:22:12Z) - Towards Realistic Low-resource Relation Extraction: A Benchmark with
Empirical Baseline Study [51.33182775762785]
This paper presents an empirical study to build relation extraction systems in low-resource settings.
We investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; and (iii) data augmentation technologies and self-training to generate more labeled in-domain data.
arXiv Detail & Related papers (2022-10-19T15:46:37Z) - Domain Adaptation Principal Component Analysis: base linear method for
learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning.
We present a method called Domain Adaptation Principal Component Analysis (DAPCA)
DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - Relationship-aware Multivariate Sampling Strategy for Scientific
Simulation Data [4.2855912967712815]
In this work, we propose a multivariate sampling strategy which preserves the original variable relationships.
Our proposed strategy utilizes principal component analysis to capture the variance of multivariate data and can be built on top of any existing state-of-the-art sampling algorithms for single variables.
arXiv Detail & Related papers (2020-08-31T00:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.