Model-centric Data Manifold: the Data Through the Eyes of the Model
- URL: http://arxiv.org/abs/2104.13289v1
- Date: Mon, 26 Apr 2021 16:03:09 GMT
- Title: Model-centric Data Manifold: the Data Through the Eyes of the Model
- Authors: Luca Grementieri, Rita Fioresi
- Abstract summary: Deep ReLU neural network classifiers can see a low-dimensional manifold structure on data.
We show that the dataset on which the model is trained lies on a leaf, the data leaf, whose dimension is bounded by the number of classification labels.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We discover that deep ReLU neural network classifiers can see a
low-dimensional Riemannian manifold structure on data. Such structure comes via
the local data matrix, a variation of the Fisher information matrix, where the
role of the model parameters is taken by the data variables. We obtain a
foliation of the data domain and we show that the dataset on which the model is
trained lies on a leaf, the data leaf, whose dimension is bounded by the number
of classification labels. We validate our results with some experiments with
the MNIST dataset: paths on the data leaf connect valid images, while other
leaves cover noisy images.
Related papers
- Probing the Latent Hierarchical Structure of Data via Diffusion Models [47.56642214162824]
We show that experiments in diffusion-based models are a promising tool to probe the latent structure of data.
We confirm this prediction in both text and image datasets using state-of-the-art diffusion models.
Our results show how latent variable changes manifest in the data and establish how to measure these effects in real data.
arXiv Detail & Related papers (2024-10-17T17:08:39Z) - Classification of Buried Objects from Ground Penetrating Radar Images by using Second Order Deep Learning Models [3.332733725674752]
A new classification model based on covariance matrices is built in order to classify buried objects.
We show in a large database that our approach outperform shallow networks designed for GPR data.
We also illustrate the interest of our models when training data and test sets are obtained from different weather modes or considerations.
arXiv Detail & Related papers (2024-09-20T08:42:30Z) - Manifold Learning via Foliations and Knowledge Transfer [0.0]
We provide a natural geometric structure on the space of data employing a deep ReLU neural network trained as a classifier.
We show that the singular points of such foliation are contained in a measure zero set, and that a local regular foliation exists almost everywhere.
Experiments show that the data is correlated with leaves of such foliation.
arXiv Detail & Related papers (2024-09-11T16:53:53Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Measuring Feature Dependency of Neural Networks by Collapsing Feature Dimensions in the Data Manifold [18.64569268049846]
We introduce a new technique to measure the feature dependency of neural network models.
The motivation is to better understand a model by querying whether it is using information from human-understandable features.
We test our method on deep neural network models trained on synthetic image data with known ground truth.
arXiv Detail & Related papers (2024-04-18T17:10:18Z) - LAESI: Leaf Area Estimation with Synthetic Imagery [13.145253458335464]
We introduce LAESI, a Synthetic Leaf dataset of 100,000 synthetic leaf images on millimeter paper.
This dataset provides a resource for leaf morphology analysis aimed at beech and oak leaves.
We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation.
arXiv Detail & Related papers (2024-03-31T07:56:07Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Neural FIM for learning Fisher Information Metrics from point cloud data [71.07939200676199]
We propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data.
We demonstrate its utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells)
arXiv Detail & Related papers (2023-06-01T17:36:13Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - ClassSPLOM -- A Scatterplot Matrix to Visualize Separation of Multiclass
Multidimensional Data [8.89134799076718]
In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data.
The model is trained on the data and tested on unseen data with known labels to evaluate its quality.
The results are visualized as a confusion matrix which shows how many data labels have been predicted correctly or confused with other classes.
arXiv Detail & Related papers (2022-01-30T14:09:19Z) - Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model.
We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information.
Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.