Subgroup Discovery in Unstructured Data
- URL: http://arxiv.org/abs/2207.07781v1
- Date: Fri, 15 Jul 2022 23:13:54 GMT
- Title: Subgroup Discovery in Unstructured Data
- Authors: Ali Arab, Dev Arora, Jialin Lu, Martin Ester
- Abstract summary: Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation.
Subgroup-aware variational autoencoder learns a representation of unstructured data which leads to subgroups with higher quality.
- Score: 7.6323763630645285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subgroup discovery is a descriptive and exploratory data mining technique to
identify subgroups in a population that exhibit interesting behavior with
respect to a variable of interest. Subgroup discovery has numerous applications
in knowledge discovery and hypothesis generation, yet it remains inapplicable
for unstructured, high-dimensional data such as images. This is because
subgroup discovery algorithms rely on defining descriptive rules based on
(attribute, value) pairs, however, in unstructured data, an attribute is not
well defined. Even in cases where the notion of attribute intuitively exists in
the data, such as a pixel in an image, due to the high dimensionality of the
data, these attributes are not informative enough to be used in a rule. In this
paper, we introduce the subgroup-aware variational autoencoder, a novel
variational autoencoder that learns a representation of unstructured data which
leads to subgroups with higher quality. Our experimental results demonstrate
the effectiveness of the method at learning subgroups with high quality while
supporting the interpretability of the concepts.
Related papers
- Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - Identification of Systematic Errors of Image Classifiers on Rare
Subgroups [12.064692111429494]
systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift.
We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance.
We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy.
arXiv Detail & Related papers (2023-03-09T07:08:25Z) - Leveraging Structure for Improved Classification of Grouped Biased Data [8.121462458089143]
We consider semi-supervised binary classification for applications in which data points are naturally grouped.
We derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-outputd classifier.
arXiv Detail & Related papers (2022-12-07T15:18:21Z) - Seeking the Truth Beyond the Data. An Unsupervised Machine Learning
Approach [0.0]
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together.
This article provides a deep description of the most widely used clustering methodologies.
It emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets.
arXiv Detail & Related papers (2022-07-14T14:22:36Z) - The Group Loss++: A deeper look into group loss for deep metric learning [65.19665861268574]
Group Loss is a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group.
We show state-of-the-art results on clustering and image retrieval on four datasets, and present competitive results on two person re-identification datasets.
arXiv Detail & Related papers (2022-04-04T14:09:58Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Identifying Biased Subgroups in Ranking and Classification [12.268135088806613]
We introduce the notion of divergence to measure performance difference.
We exploit it in the context of (i) classification models and (ii) ranking applications.
We quantify the contribution of all attributes in the data subgroup to the divergent behavior by means of Shapley values.
arXiv Detail & Related papers (2021-08-17T05:26:11Z) - Class Introspection: A Novel Technique for Detecting Unlabeled
Subclasses by Leveraging Classifier Explainability Methods [0.0]
latent structure is a crucial step in performing analysis of a dataset.
By leveraging instance explanation methods, an existing classifier can be extended to detect latent classes.
This paper also contains a pipeline for analyzing classifiers automatically, and a web application for interactively exploring the results from this technique.
arXiv Detail & Related papers (2021-07-04T14:58:29Z) - Learning Multi-Attention Context Graph for Group-Based Re-Identification [214.84551361855443]
Learning to re-identify or retrieve a group of people across non-overlapped camera systems has important applications in video surveillance.
In this work, we consider employing context information for identifying groups of people, i.e., group re-id.
We propose a novel unified framework based on graph neural networks to simultaneously address the group-based re-id tasks.
arXiv Detail & Related papers (2021-04-29T09:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.