Causal Deep Learning: Causal Capsules and Tensor Transformers
- URL: http://arxiv.org/abs/2301.00314v1
- Date: Sun, 1 Jan 2023 00:47:03 GMT
- Title: Causal Deep Learning: Causal Capsules and Tensor Transformers
- Authors: M. Alex O. Vasilescu
- Abstract summary: Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects.
Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive a set of causal deep neural networks whose architectures are a
consequence of tensor (multilinear) factor analysis. Forward causal questions
are addressed with a neural network architecture composed of causal capsules
and a tensor transformer. The former estimate a set of latent variables that
represent the causal factors, and the latter governs their interaction. Causal
capsules and tensor transformers may be implemented using shallow autoencoders,
but for a scalable architecture we employ block algebra and derive a deep
neural network composed of a hierarchy of autoencoders. An interleaved kernel
hierarchy preprocesses the data resulting in a hierarchy of kernel tensor
factor models. Inverse causal questions are addressed with a neural network
that implements multilinear projection and estimates the causes of effects. As
an alternative to aggressive bottleneck dimension reduction or regularized
regression that may camouflage an inherently underdetermined inverse problem,
we prescribe modeling different aspects of the mechanism of data formation with
piecewise tensor models whose multilinear projections are well-defined and
produce multiple candidate solutions. Our forward and inverse neural network
architectures are suitable for asynchronous parallel computation.
Related papers
- Using Degeneracy in the Loss Landscape for Mechanistic Interpretability [0.0]
Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations.
An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network.
arXiv Detail & Related papers (2024-05-17T17:26:33Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied.
We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers.
We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - A predictive physics-aware hybrid reduced order model for reacting flows [65.73506571113623]
A new hybrid predictive Reduced Order Model (ROM) is proposed to solve reacting flow problems.
The number of degrees of freedom is reduced from thousands of temporal points to a few POD modes with their corresponding temporal coefficients.
Two different deep learning architectures have been tested to predict the temporal coefficients.
arXiv Detail & Related papers (2023-01-24T08:39:20Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Modeling Structure with Undirected Neural Networks [20.506232306308977]
We propose undirected neural networks, a flexible framework for specifying computations that can be performed in any order.
We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks.
arXiv Detail & Related papers (2022-02-08T10:06:51Z) - A Sparse Coding Interpretation of Neural Networks and Theoretical
Implications [0.0]
Deep convolutional neural networks have achieved unprecedented performance in various computer vision tasks.
We propose a sparse coding interpretation of neural networks that have ReLU activation.
We derive a complete convolutional neural network without normalization and pooling.
arXiv Detail & Related papers (2021-08-14T21:54:47Z) - Non-asymptotic Excess Risk Bounds for Classification with Deep
Convolutional Neural Networks [6.051520664893158]
We consider the problem of binary classification with a class of general deep convolutional neural networks.
We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters.
We show that the classification methods with CNNs can circumvent the curse of dimensionality.
arXiv Detail & Related papers (2021-05-01T15:55:04Z) - Deep Neural-Kernel Machines [4.213427823201119]
In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture.
We introduce a neural- Kernel architecture that serves as the core module for deeper models equipped with different pooling layers.
In particular, we review three neural- Kernel machines with average, maxout and convolutional pooling layers.
arXiv Detail & Related papers (2020-07-13T19:46:29Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.