Understanding Concept Identification as Consistent Data Clustering
Across Multiple Feature Spaces
- URL: http://arxiv.org/abs/2301.05525v2
- Date: Tue, 14 Nov 2023 13:29:06 GMT
- Title: Understanding Concept Identification as Consistent Data Clustering
Across Multiple Feature Spaces
- Authors: Felix Lanfermann, Sebastian Schmitt, Patricia Wollstadt
- Abstract summary: Concept identification aims at identifying groups of design instances that are similar in a joint space of all features.
It is desirable to evaluate the quality of design concepts by considering several of these feature subsets in isolation.
In this work, we propose to view concept identification as a special form of clustering algorithm with a broad range of potential applications.
- Score: 2.631955426232593
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Identifying meaningful concepts in large data sets can provide valuable
insights into engineering design problems. Concept identification aims at
identifying non-overlapping groups of design instances that are similar in a
joint space of all features, but which are also similar when considering only
subsets of features. These subsets usually comprise features that characterize
a design with respect to one specific context, for example, constructive design
parameters, performance values, or operation modes. It is desirable to evaluate
the quality of design concepts by considering several of these feature subsets
in isolation. In particular, meaningful concepts should not only identify
dense, well separated groups of data instances, but also provide
non-overlapping groups of data that persist when considering pre-defined
feature subsets separately. In this work, we propose to view concept
identification as a special form of clustering algorithm with a broad range of
potential applications beyond engineering design. To illustrate the differences
between concept identification and classical clustering algorithms, we apply a
recently proposed concept identification algorithm to two synthetic data sets
and show the differences in identified solutions. In addition, we introduce the
mutual information measure as a metric to evaluate whether solutions return
consistent clusters across relevant subsets. To support the novel understanding
of concept identification, we consider a simulated data set from a
decision-making problem in the energy management domain and show that the
identified clusters are more interpretable with respect to relevant feature
subsets than clusters found by common clustering algorithms and are thus more
suitable to support a decision maker.
Related papers
- Discriminative Anchor Learning for Efficient Multi-view Clustering [59.11406089896875]
We propose discriminative anchor learning for multi-view clustering (DALMC)
We learn discriminative view-specific feature representations according to the original dataset.
We build anchors from different views based on these representations, which increase the quality of the shared anchor graph.
arXiv Detail & Related papers (2024-09-25T13:11:17Z) - Normalization in Proportional Feature Spaces [49.48516314472825]
normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling.
The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features.
arXiv Detail & Related papers (2024-09-17T17:46:27Z) - ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Towards Explainable Clustering: A Constrained Declarative based Approach [0.294944680995069]
We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable.
A good global explanation of a clustering should give the characteristics of each cluster taking into account their abilities to describe its objects.
We propose a novel interpretable constrained method called ECS for declarative computation with Explainabilty-driven Selection.
arXiv Detail & Related papers (2024-03-26T21:00:06Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest.
This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z) - Concept Identification for Complex Engineering Datasets [0.0]
A novel concept quality measure is proposed, which provides an objective value for a given definition of concepts in a dataset.
It is demonstrated how these concepts can be used to select archetypal representatives of the dataset which exhibit characteristic features of each concept.
arXiv Detail & Related papers (2022-06-09T09:39:46Z) - A Framework for Multi-View Classification of Features [6.660458629649826]
In solving the data classification problems, when the feature set is too large, typical approaches will not be able to solve the problem.
In this research, an innovative framework for multi-view ensemble classification, inspired by the problem of object recognition in the multiple views theory of humans, is proposed.
arXiv Detail & Related papers (2021-08-02T16:27:43Z) - A review of systematic selection of clustering algorithms and their
evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z) - HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis [2.5329716878122404]
Comprehensive benchmarking of clustering algorithms is difficult.
There is no consensus regarding the best practice for rigorous benchmarking.
We demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks.
arXiv Detail & Related papers (2021-02-13T15:01:34Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.