Related papers: Generating Interpretable Networks using Hypernetworks

Generating Interpretable Networks using Hypernetworks

URL: http://arxiv.org/abs/2312.03051v1
Date: Tue, 5 Dec 2023 18:55:32 GMT
Title: Generating Interpretable Networks using Hypernetworks
Authors: Isaac Liao, Ziming Liu, Max Tegmark
Abstract summary: We explore the possibility of using hypernetworks to generate interpretable networks whose underlying algorithms are not yet known. For the task of computing L1 norms, hypernetworks find three algorithms: (a) the double-sided algorithm, (b) the convexity algorithm, (c) the pudding algorithm. We show that a trained hypernetwork can correctly construct models for input dimensions not seen in training, demonstrating systematic generalization.
Score: 16.876961991785507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An essential goal in mechanistic interpretability to decode a network, i.e., to convert a neural network's raw weights to an interpretable algorithm. Given the difficulty of the decoding problem, progress has been made to understand the easier encoding problem, i.e., to convert an interpretable algorithm into network weights. Previous works focus on encoding existing algorithms into networks, which are interpretable by definition. However, focusing on encoding limits the possibility of discovering new algorithms that humans have never stumbled upon, but that are nevertheless interpretable. In this work, we explore the possibility of using hypernetworks to generate interpretable networks whose underlying algorithms are not yet known. The hypernetwork is carefully designed such that it can control network complexity, leading to a diverse family of interpretable algorithms ranked by their complexity. All of them are interpretable in hindsight, although some of them are less intuitive to humans, hence providing new insights regarding how to "think" like a neural network. For the task of computing L1 norms, hypernetworks find three algorithms: (a) the double-sided algorithm, (b) the convexity algorithm, (c) the pudding algorithm, although only the first algorithm was expected by the authors before experiments. We automatically classify these algorithms and analyze how these algorithmic phases develop during training, as well as how they are affected by complexity control. Furthermore, we show that a trained hypernetwork can correctly construct models for input dimensions not seen in training, demonstrating systematic generalization.

Related papers

On the algorithmic construction of deep ReLU networks [0.0]
We take the perspective of a neural network as an algorithm.<n>In this analogy, a neural network is programmed constructively, rather than trained from data.<n>We construct and analyze several other examples, both existing and new.
arXiv Detail & Related papers (2025-06-23T20:35:52Z)
Mind The Gap: Deep Learning Doesn't Learn Deeply [16.284360949127723]
This paper aims to understand how neural networks learn algorithmic reasoning by addressing two questions.<n>How faithful are learned algorithms when they are effective, and why do neural networks fail to learn effective algorithms otherwise?
arXiv Detail & Related papers (2025-05-24T10:11:36Z)
The Deep Equilibrium Algorithmic Reasoner [20.375241527453447]
We show that graph neural networks (GNNs) can learn to execute classical algorithms. We conjecture and empirically validate that one can train a network to solve algorithmic problems by directly finding the equilibrium.
arXiv Detail & Related papers (2024-02-09T14:46:50Z)
Convolutional neural network based decoders for surface codes [0.0]
This work reports a study of decoders based on convolutional neural networks, tested on different code distances and noise models. The results show that decoders based on convolutional neural networks have good performance and can adapt to different noise models.
arXiv Detail & Related papers (2023-12-06T14:07:31Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Learning with Differentiable Algorithms [6.47243430672461]
This thesis explores combining classic algorithms and machine learning systems like neural networks. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable sorting gates, and differentiable logic gate networks.
arXiv Detail & Related papers (2022-09-01T17:30:00Z)
Graph Neural Networks are Dynamic Programmers [0.0]
Graph neural networks (GNNs) are claimed to align with dynamic programming (DP) Here we show, using methods from theory and abstract algebra, that there exists an intricate connection between GNNs and DP.
arXiv Detail & Related papers (2022-03-29T13:27:28Z)
Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph. Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards. We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.