Attribute-based Explanations of Non-Linear Embeddings of
High-Dimensional Data
- URL: http://arxiv.org/abs/2108.08706v1
- Date: Wed, 28 Jul 2021 12:09:29 GMT
- Title: Attribute-based Explanations of Non-Linear Embeddings of
High-Dimensional Data
- Authors: Jan-Tobias Sohns, Michaela Schmitt, Fabian Jirasek, Hans Hasse, and
Heike Leitte
- Abstract summary: Non-linear Embeddings Surveyor (NoLiES) combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting.
Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers.
- Score: 2.397739143553337
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Embeddings of high-dimensional data are widely used to explore data, to
verify analysis results, and to communicate information. Their explanation, in
particular with respect to the input attributes, is often difficult. With
linear projects like PCA the axes can still be annotated meaningfully. With
non-linear projections this is no longer possible and alternative strategies
such as attribute-based color coding are required. In this paper, we review
existing augmentation techniques and discuss their limitations. We present the
Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation
strategy for projected data (rangesets) with interactive analysis in a small
multiples setting. Rangesets use a set-based visualization approach for binned
attribute values that enable the user to quickly observe structure and detect
outliers. We detail the link between algebraic topology and rangesets and
demonstrate the utility of NoLiES in case studies with various challenges
(complex attribute value distribution, many attributes, many data points) and a
real-world application to understand latent features of matrix completion in
thermodynamics.
Related papers
- DSAI: Unbiased and Interpretable Latent Feature Extraction for Data-Centric AI [24.349800949355465]
Large language models (LLMs) often struggle to objectively identify latent characteristics in large datasets.
We propose Data Scientist AI (DSAI), a framework that enables unbiased and interpretable feature extraction.
arXiv Detail & Related papers (2024-12-09T08:47:05Z) - Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.
To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.
To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.
The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - FATE: Feature-Agnostic Transformer-based Encoder for learning
generalized embedding spaces in flow cytometry data [4.550634499956126]
We aim at effectively leveraging data with varying features, without the need to constrain the input space to the intersection of potential feature sets.
We propose a novel architecture that can directly process data without the necessity of aligned feature modalities.
The advantages of the model are demonstrated for automatic cancer cell detection in acute myeloid leukemia in flow data.
arXiv Detail & Related papers (2023-11-06T18:06:38Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Feature construction using explanations of individual predictions [0.0]
We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models.
We empirically show that reducing the search to these groups significantly reduces the time of feature construction.
We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
arXiv Detail & Related papers (2023-01-23T18:59:01Z) - Siamese Attribute-missing Graph Auto-encoder [35.79233150253881]
We propose Siamese Attribute-missing Graph Auto-encoder (SAGA)
First, we entangle the attribute embedding and structure embedding by introducing a siamese network structure to share the parameters learned by both processes.
Second, we introduce a K-nearest neighbor (KNN) and structural constraint enhanced learning mechanism to improve the quality of latent features of the missing attributes.
arXiv Detail & Related papers (2021-12-09T11:21:31Z) - Adaptive Attribute and Structure Subspace Clustering Network [49.040136530379094]
We propose a novel self-expressiveness-based subspace clustering network.
We first consider an auto-encoder to represent input data samples.
Then, we construct a mixed signed and symmetric structure matrix to capture the local geometric structure underlying data.
We perform self-expressiveness on the constructed attribute structure and matrices to learn their affinity graphs.
arXiv Detail & Related papers (2021-09-28T14:00:57Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.