Related papers: Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces

Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces

URL: http://arxiv.org/abs/2301.05525v2
Date: Tue, 14 Nov 2023 13:29:06 GMT
Title: Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces
Authors: Felix Lanfermann, Sebastian Schmitt, Patricia Wollstadt
Abstract summary: Concept identification aims at identifying groups of design instances that are similar in a joint space of all features. It is desirable to evaluate the quality of design concepts by considering several of these feature subsets in isolation. In this work, we propose to view concept identification as a special form of clustering algorithm with a broad range of potential applications.
Score: 2.631955426232593
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Identifying meaningful concepts in large data sets can provide valuable insights into engineering design problems. Concept identification aims at identifying non-overlapping groups of design instances that are similar in a joint space of all features, but which are also similar when considering only subsets of features. These subsets usually comprise features that characterize a design with respect to one specific context, for example, constructive design parameters, performance values, or operation modes. It is desirable to evaluate the quality of design concepts by considering several of these feature subsets in isolation. In particular, meaningful concepts should not only identify dense, well separated groups of data instances, but also provide non-overlapping groups of data that persist when considering pre-defined feature subsets separately. In this work, we propose to view concept identification as a special form of clustering algorithm with a broad range of potential applications beyond engineering design. To illustrate the differences between concept identification and classical clustering algorithms, we apply a recently proposed concept identification algorithm to two synthetic data sets and show the differences in identified solutions. In addition, we introduce the mutual information measure as a metric to evaluate whether solutions return consistent clusters across relevant subsets. To support the novel understanding of concept identification, we consider a simulated data set from a decision-making problem in the energy management domain and show that the identified clusters are more interpretable with respect to relevant feature subsets than clusters found by common clustering algorithms and are thus more suitable to support a decision maker.

Related papers

Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning [65.75756724642932]
In incomplete multi-view clustering, missing data induce prototype shifts within views and semantic inconsistencies across views.<n>We propose an IMVC framework, imputation- and alignment-free for consensus semantics learning (FreeCSL)<n>FreeCSL achieves more confident and robust assignments on IMVC task, compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-05-16T12:37:10Z)
Discriminative Ordering Through Ensemble Consensus [12.714723443928298]
We take inspiration from consensus clustering and assume that a set of clustering models is able to uncover hidden structures in the data.<n>We propose a discriminative ordering through ensemble clustering based on the distance between the connectivity of a clustering model and the consensus matrix.
arXiv Detail & Related papers (2025-05-07T14:35:39Z)
Discriminative Anchor Learning for Efficient Multi-view Clustering [59.11406089896875]
We propose discriminative anchor learning for multi-view clustering (DALMC) We learn discriminative view-specific feature representations according to the original dataset. We build anchors from different views based on these representations, which increase the quality of the shared anchor graph.
arXiv Detail & Related papers (2024-09-25T13:11:17Z)
Normalization in Proportional Feature Spaces [49.48516314472825]
normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling. The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features.
arXiv Detail & Related papers (2024-09-17T17:46:27Z)
ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items. The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z)
Towards Explainable Clustering: A Constrained Declarative based Approach [0.294944680995069]
We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable. A good global explanation of a clustering should give the characteristics of each cluster taking into account their abilities to describe its objects. We propose a novel interpretable constrained method called ECS for declarative computation with Explainabilty-driven Selection.
arXiv Detail & Related papers (2024-03-26T21:00:06Z)
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z)
Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z)
Concept Identification for Complex Engineering Datasets [0.0]
A novel concept quality measure is proposed, which provides an objective value for a given definition of concepts in a dataset. It is demonstrated how these concepts can be used to select archetypal representatives of the dataset which exhibit characteristic features of each concept.
arXiv Detail & Related papers (2022-06-09T09:39:46Z)
A Framework for Multi-View Classification of Features [6.660458629649826]
In solving the data classification problems, when the feature set is too large, typical approaches will not be able to solve the problem. In this research, an innovative framework for multi-view ensemble classification, inspired by the problem of object recognition in the multiple views theory of humans, is proposed.
arXiv Detail & Related papers (2021-08-02T16:27:43Z)
A review of systematic selection of clustering algorithms and their evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z)
HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis [2.5329716878122404]
Comprehensive benchmarking of clustering algorithms is difficult. There is no consensus regarding the best practice for rigorous benchmarking. We demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks.
arXiv Detail & Related papers (2021-02-13T15:01:34Z)
Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization. Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.