Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing
- URL: http://arxiv.org/abs/2507.00039v1
- Date: Thu, 19 Jun 2025 07:28:41 GMT
- Title: Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing
- Authors: Lucas Potin, Rosa Figueiredo, Vincent Labatut, Christine Largeron,
- Abstract summary: Graph classification aims to categorize graphs based on their structural and attribute features, with applications in diverse fields such as social network analysis and bioinformatics.<n>To identify meaningful patterns, a standard approach is to use a quality measure, i.e. a function that evaluates the discriminative power of each pattern.<n>Only a handful of surveys try to provide some insight by comparing these measures, and none of them specifically focuses on graphs.<n>We present a comparative analysis of 38 quality measures from the literature, and propose a method to elaborate a gold standard ranking of the patterns.
- Score: 3.1970244655208306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph classification aims to categorize graphs based on their structural and attribute features, with applications in diverse fields such as social network analysis and bioinformatics. Among the methods proposed to solve this task, those relying on patterns (i.e. subgraphs) provide good explainability, as the patterns used for classification can be directly interpreted. To identify meaningful patterns, a standard approach is to use a quality measure, i.e. a function that evaluates the discriminative power of each pattern. However, the literature provides tens of such measures, making it difficult to select the most appropriate for a given application. Only a handful of surveys try to provide some insight by comparing these measures, and none of them specifically focuses on graphs. This typically results in the systematic use of the most widespread measures, without thorough evaluation. To address this issue, we present a comparative analysis of 38 quality measures from the literature. We characterize them theoretically, based on four mathematical properties. We leverage publicly available datasets to constitute a benchmark, and propose a method to elaborate a gold standard ranking of the patterns. We exploit these resources to perform an empirical comparison of the measures, both in terms of pattern ranking and classification performance. Moreover, we propose a clustering-based preprocessing step, which groups patterns appearing in the same graphs to enhance classification performance. Our experimental results demonstrate the effectiveness of this step, reducing the number of patterns to be processed while achieving comparable performance. Additionally, we show that some popular measures widely used in the literature are not associated with the best results.
Related papers
- Improving LLM Leaderboards with Psychometrical Methodology [0.0]
The rapid development of large language models (LLMs) has necessitated the creation of benchmarks to evaluate their performance.<n>These benchmarks resemble human tests and surveys, as they consist of questions designed to measure emergent properties in the cognitive behavior of these systems.<n>However, unlike the well-defined traits and abilities studied in social sciences, the properties measured by these benchmarks are often vaguer and less rigorously defined.
arXiv Detail & Related papers (2025-01-27T21:21:46Z) - A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs [50.982315553104975]
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms.<n>Most SMMs are manually built by human experts using bottom-up procedures.<n>We propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.
arXiv Detail & Related papers (2024-12-02T12:06:41Z) - Exploring Description-Augmented Dataless Intent Classification [1.5839621757142595]
We introduce several schemes to leverage description-augmented embedding similarity for dataless intent classification.
We show promising results for dataless classification scaling to a large number of unseen intents.
arXiv Detail & Related papers (2024-07-25T08:31:57Z) - Detecting Statements in Text: A Domain-Agnostic Few-Shot Solution [1.3654846342364308]
State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce.
We propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task.
We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection.
arXiv Detail & Related papers (2024-05-09T12:03:38Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Hub-aware Random Walk Graph Embedding Methods for Classification [44.99833362998488]
We propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem.
The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real-world networks.
arXiv Detail & Related papers (2022-09-15T20:41:18Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - A Broader Picture of Random-walk Based Graph Embedding [2.6546685109604304]
Graph embedding based on random-walks supports effective solutions for many graph-related downstream tasks.
We develop an analytical framework for random-walk based graph embedding that consists of three components: a random-walk process, a similarity function, and an embedding algorithm.
We show that embeddings based on autocovariance similarity, when paired with dot product ranking for link prediction, outperform state-of-the-art methods based on Pointwise Mutual Information similarity by up to 100%.
arXiv Detail & Related papers (2021-10-24T03:40:16Z) - ECKPN: Explicit Class Knowledge Propagation Network for Transductive
Few-shot Learning [53.09923823663554]
Class-level knowledge can be easily learned by humans from just a handful of samples.
We propose an Explicit Class Knowledge Propagation Network (ECKPN) to address this problem.
We conduct extensive experiments on four few-shot classification benchmarks, and the experimental results show that the proposed ECKPN significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-16T02:29:43Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods [27.27230441498167]
We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment.
In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets.
We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
arXiv Detail & Related papers (2020-02-17T12:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.