Related papers: CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor

CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor

URL: http://arxiv.org/abs/2506.04001v1
Date: Wed, 04 Jun 2025 14:30:55 GMT
Title: CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Authors: Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun,
Abstract summary: Performance predictors have emerged as a promising method to accelerate the evaluation stage of neural architecture search (NAS)<n>We propose a Causality-guided Architecture Representation Learning (CARL) method aiming to separate critical (causal) and redundant (non-causal) features of architectures for generalizable architecture performance prediction.<n>Experiments on five NAS search spaces demonstrate the state-of-the-art accuracy and superior interpretability of CARL.
Score: 6.014777261874645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance predictors have emerged as a promising method to accelerate the evaluation stage of neural architecture search (NAS). These predictors estimate the performance of unseen architectures by learning from the correlation between a small set of trained architectures and their performance. However, most existing predictors ignore the inherent distribution shift between limited training samples and diverse test samples. Hence, they tend to learn spurious correlations as shortcuts to predictions, leading to poor generalization. To address this, we propose a Causality-guided Architecture Representation Learning (CARL) method aiming to separate critical (causal) and redundant (non-causal) features of architectures for generalizable architecture performance prediction. Specifically, we employ a substructure extractor to split the input architecture into critical and redundant substructures in the latent space. Then, we generate multiple interventional samples by pairing critical representations with diverse redundant representations to prioritize critical features. Extensive experiments on five NAS search spaces demonstrate the state-of-the-art accuracy and superior interpretability of CARL. For instance, CARL achieves 97.67% top-1 accuracy on CIFAR-10 using DARTS.

Related papers

Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification [0.17999333451993949]
Kolmogorov-Arnold Networks (KANs) have been proposed as a more interpretable alternative to state-of-the-art models.<n>In this paper, we aim to conduct a comprehensive and robust exploration of the KAN architecture for time series classification.<n>Our results show that (1) Efficient KAN outperforms in performance and computational efficiency, showcasing its suitability for tasks classification tasks.
arXiv Detail & Related papers (2024-11-22T13:01:36Z)
CAP: A Context-Aware Neural Predictor for NAS [4.8761456288582945]
We propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors.
arXiv Detail & Related papers (2024-06-04T07:37:47Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation [15.353256018248103]
LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding. We present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model. Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture.
arXiv Detail & Related papers (2023-01-26T14:52:30Z)
AIO-P: Expanding Neural Performance Predictors Beyond Image Classification [22.743278613519152]
We propose a novel All-in-One Predictor (AIO-P) to pretrain neural predictors on architecture examples. AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively.
arXiv Detail & Related papers (2022-11-30T18:30:41Z)
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. We formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z)
The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training. We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture. Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z)
Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space. Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.