Contrastive independent component analysis
- URL: http://arxiv.org/abs/2407.02357v2
- Date: Mon, 26 May 2025 09:09:15 GMT
- Title: Contrastive independent component analysis
- Authors: Kexin Wang, Aida Maraj, Anna Seigal,
- Abstract summary: We devise a new linear algebra based tensor decomposition algorithm, which is more expressive but just as efficient and identifiable as other linear algebra based algorithms.<n>We establish the identifiability of cICA and demonstrate its performance in finding patterns and visualizing data, using synthetic, semi-synthetic, and real-world datasets.
- Score: 6.348278114271242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been growing interest in jointly analyzing a foreground dataset, representing an experimental group, and a background dataset, representing a control group. The goal of such contrastive investigations is to identify salient features in the experimental group relative to the control. Independent component analysis (ICA) is a powerful tool for learning independent patterns in a dataset. We generalize it to contrastive ICA (cICA). For this purpose, we devise a new linear algebra based tensor decomposition algorithm, which is more expressive but just as efficient and identifiable as other linear algebra based algorithms. We establish the identifiability of cICA and demonstrate its performance in finding patterns and visualizing data, using synthetic, semi-synthetic, and real-world datasets, comparing the approach to existing methods.
Related papers
- Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task [41.94295877935867]
We propose an alternative foundation based on Abstract Algebra, with mathematics that facilitates the analysis of learning.<n>In this approach, the goal of the task and the data are encoded as axioms of an algebra, and a model is obtained where only these axioms and their logical consequences hold.<n>We validate this new learning principle on standard datasets such as MNIST, FashionMNIST, CIFAR-10, and medical images, achieving performance comparable to optimized multilayer perceptrons.
arXiv Detail & Related papers (2025-02-27T10:13:42Z) - Spectral Self-supervised Feature Selection [7.052728135831165]
We propose a self-supervised graph-based approach for unsupervised feature selection.<n>Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors.<n>Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures.
arXiv Detail & Related papers (2024-07-12T07:29:08Z) - Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees.
Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure.
We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z) - Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Integrated Gradient Correlation: a Dataset-wise Attribution Method [0.0]
We present a dataset-wise attribution method called Integrated Gradient Correlation (IGC)<n>IGC enables region-specific analysis by a direct summation over associated components, and further relates the sum of all attributions to a model prediction score (correlation)<n>We demonstrate IGC on synthetic data and fMRI neural signals (NSD dataset) with the study of the representation of image features in the brain.
arXiv Detail & Related papers (2024-04-22T06:42:21Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - A Spectral Method for Assessing and Combining Multiple Data
Visualizations [13.193958370464683]
We propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset.
The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point.
We analyze multiple simulated and real-world datasets to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization.
arXiv Detail & Related papers (2022-10-25T02:13:19Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - A Computational Model for Logical Analysis of Data [0.0]
LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques.
We propose several models for representing the data set of observations, according to the information we have on it.
Analytic Combinatorics allows us to express the desired probabilities as ratios of generating functions coefficients.
arXiv Detail & Related papers (2022-07-12T16:47:59Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - Interactive Dimensionality Reduction for Comparative Analysis [28.52130400665133]
We introduce an interactive DR framework where we integrate our new DR method, called ULCA, with an interactive visual interface.
ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks.
We develop an optimization algorithm that enables analysts to interactively refine ULCA results.
arXiv Detail & Related papers (2021-06-29T15:05:36Z) - Capturing patterns of variation unique to a specific dataset [68.8204255655161]
We propose a tuning-free method that identifies low-dimensional representations of a target dataset relative to one or more comparison datasets.
We show in several experiments that UCA with a single background dataset achieves similar results compared to cPCA with various tuning parameters.
arXiv Detail & Related papers (2021-04-16T15:07:32Z) - Joint Characterization of Multiscale Information in High Dimensional
Data [0.0]
We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction.
We show that joint characterization is capable of detecting and isolating signals which are not evident from either PCA or t-sne alone.
arXiv Detail & Related papers (2021-02-18T23:33:00Z) - Contrastive analysis for scatter plot-based representations of
dimensionality reduction [0.0]
This paper introduces a methodology to explore multidimensional datasets and interpret clusters' formation.
We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables used to understand how the attributes influenced cluster formation.
arXiv Detail & Related papers (2021-01-26T01:16:31Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Learning Stochastic Behaviour from Aggregate Data [52.012857267317784]
Learning nonlinear dynamics from aggregate data is a challenging problem because the full trajectory of each individual is not available.
We propose a novel method using the weak form of Fokker Planck Equation (FPE) to describe the density evolution of data in a sampled form.
In such a sample-based framework we are able to learn the nonlinear dynamics from aggregate data without explicitly solving the partial differential equation (PDE) FPE.
arXiv Detail & Related papers (2020-02-10T03:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.