Related papers: Causal Deep Learning: Causal Capsules and Tensor Transformers

Causal Deep Learning: Causal Capsules and Tensor Transformers

URL: http://arxiv.org/abs/2301.00314v1
Date: Sun, 1 Jan 2023 00:47:03 GMT
Title: Causal Deep Learning: Causal Capsules and Tensor Transformers
Authors: M. Alex O. Vasilescu
Abstract summary: Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects. Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We derive a set of causal deep neural networks whose architectures are a consequence of tensor (multilinear) factor analysis. Forward causal questions are addressed with a neural network architecture composed of causal capsules and a tensor transformer. The former estimate a set of latent variables that represent the causal factors, and the latter governs their interaction. Causal capsules and tensor transformers may be implemented using shallow autoencoders, but for a scalable architecture we employ block algebra and derive a deep neural network composed of a hierarchy of autoencoders. An interleaved kernel hierarchy preprocesses the data resulting in a hierarchy of kernel tensor factor models. Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects. As an alternative to aggressive bottleneck dimension reduction or regularized regression that may camouflage an inherently underdetermined inverse problem, we prescribe modeling different aspects of the mechanism of data formation with piecewise tensor models whose multilinear projections are well-defined and produce multiple candidate solutions. Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.

Related papers

An XAI-based Analysis of Shortcut Learning in Neural Networks [2.592470112714595]
We introduce the neuron spurious score to quantify a neuron's dependence on spurious features. Our results show that spurious features are partially disentangled, but the degree of disentanglement varies across model architectures. Our results lay the groundwork for the development of novel methods to mitigate spurious correlations and make AI models safer to use in practice.
arXiv Detail & Related papers (2025-04-22T07:40:45Z)
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability [0.0]
Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network.
arXiv Detail & Related papers (2024-05-17T17:26:33Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
A Spectral Theory of Neural Prediction and Alignment [8.65717258105897]
We use a recent theoretical framework that relates the generalization error from regression to the spectral properties of the model and the target. We test a large number of deep neural networks that predict visual cortical activity and show that there are multiple types of geometries that result in low neural prediction error as measured via regression.
arXiv Detail & Related papers (2023-09-22T12:24:06Z)
Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied. We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers. We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
A predictive physics-aware hybrid reduced order model for reacting flows [65.73506571113623]
A new hybrid predictive Reduced Order Model (ROM) is proposed to solve reacting flow problems. The number of degrees of freedom is reduced from thousands of temporal points to a few POD modes with their corresponding temporal coefficients. Two different deep learning architectures have been tested to predict the temporal coefficients.
arXiv Detail & Related papers (2023-01-24T08:39:20Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Modeling Structure with Undirected Neural Networks [20.506232306308977]
We propose undirected neural networks, a flexible framework for specifying computations that can be performed in any order. We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks.
arXiv Detail & Related papers (2022-02-08T10:06:51Z)
A Sparse Coding Interpretation of Neural Networks and Theoretical Implications [0.0]
Deep convolutional neural networks have achieved unprecedented performance in various computer vision tasks. We propose a sparse coding interpretation of neural networks that have ReLU activation. We derive a complete convolutional neural network without normalization and pooling.
arXiv Detail & Related papers (2021-08-14T21:54:47Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks [6.051520664893158]
We consider the problem of binary classification with a class of general deep convolutional neural networks. We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters. We show that the classification methods with CNNs can circumvent the curse of dimensionality.
arXiv Detail & Related papers (2021-05-01T15:55:04Z)
Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer. In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z)
Deep Neural-Kernel Machines [4.213427823201119]
In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture. We introduce a neural- Kernel architecture that serves as the core module for deeper models equipped with different pooling layers. In particular, we review three neural- Kernel machines with average, maxout and convolutional pooling layers.
arXiv Detail & Related papers (2020-07-13T19:46:29Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.