Related papers: An introduction to graphical tensor notation for mechanistic interpretability

An introduction to graphical tensor notation for mechanistic interpretability

URL: http://arxiv.org/abs/2402.01790v1
Date: Fri, 2 Feb 2024 02:56:01 GMT
Title: An introduction to graphical tensor notation for mechanistic interpretability
Authors: Jordan K. Taylor
Abstract summary: It's often easy to get confused about which operations are happening between tensors. The first half of this document introduces the notation and applies it to some decompositions. The second half applies it to some existing some foundational approaches for mechanistically understanding language models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graphical tensor notation is a simple way of denoting linear operations on tensors, originating from physics. Modern deep learning consists almost entirely of operations on or between tensors, so easily understanding tensor operations is quite important for understanding these systems. This is especially true when attempting to reverse-engineer the algorithms learned by a neural network in order to understand its behavior: a field known as mechanistic interpretability. It's often easy to get confused about which operations are happening between tensors and lose sight of the overall structure, but graphical tensor notation makes it easier to parse things at a glance and see interesting equivalences. The first half of this document introduces the notation and applies it to some decompositions (SVD, CP, Tucker, and tensor network decompositions), while the second half applies it to some existing some foundational approaches for mechanistically understanding language models, loosely following ``A Mathematical Framework for Transformer Circuits'', then constructing an example ``induction head'' circuit in graphical tensor notation.

Related papers

Tensor Convolutional Network for Higher-Order Interaction Prediction in Sparse Tensors [74.31355755781343]
We propose TCN, an accurate and compatible tensor convolutional network that integrates seamlessly with TF methods for predicting top-k interactions. We show that TCN integrated with a TF method outperforms competitors, including TF methods and a hyperedge prediction method.
arXiv Detail & Related papers (2025-03-14T18:22:20Z)
Very Basics of Tensors with Graphical Notations: Unfolding, Calculations, and Decompositions [4.092862870428798]
This lecture note is about the basics of tensors and how to represent them in mathematical symbols and graphical notation. The purpose of this lecture note is to learn the very basics of tensors and how to represent them in mathematical symbols and graphical notation.
arXiv Detail & Related papers (2024-11-25T05:02:35Z)
The Tensor as an Informational Resource [1.3044677039636754]
A tensor is a multidimensional array of numbers that can be used to store data, encode a computational relation and represent quantum entanglement. We propose a family of information-theoretically constructed preorders on tensors, which can be used to compare tensors with each other and to assess the existence of transformations between them.
arXiv Detail & Related papers (2023-11-03T18:47:39Z)
Decomposition of linear tensor transformations [0.0]
The aim of this paper is to develop a mathematical framework for exact tensor decomposition. In the paper three different problems will be carried out to derive.
arXiv Detail & Related papers (2023-09-14T16:14:38Z)
TensorKrowch: Smooth integration of tensor networks in machine learning [46.0920431279359]
We introduceKrowch, an open source Python library built on top of PyTorch. Krowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models.
arXiv Detail & Related papers (2023-06-14T15:55:19Z)
Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution. We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization. Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z)
Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions. For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time. In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z)
Tensor networks in machine learning [0.0]
A tensor network is a decomposition used to express and approximate large arrays of data. A merger of tensor networks with machine learning is natural. Herein the network parameters are adjusted to learn or classify a data-set.
arXiv Detail & Related papers (2022-07-06T18:00:00Z)
Stack operation of tensor networks [10.86105335102537]
We propose a mathematically rigorous definition for the tensor network stack approach. We illustrate the main ideas with the matrix product states based machine learning as an example.
arXiv Detail & Related papers (2022-03-28T12:45:13Z)
Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Tensor Methods in Computer Vision and Deep Learning [120.3881619902096]
tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental. This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning.
arXiv Detail & Related papers (2021-07-07T18:42:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.