Discrete-Valued Neural Communication
- URL: http://arxiv.org/abs/2107.02367v2
- Date: Wed, 7 Jul 2021 01:05:59 GMT
- Title: Discrete-Valued Neural Communication
- Authors: Dianbo Liu Dianbo_Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen
Sun, Michael Curtis Mozer, Yoshua Bengio
- Abstract summary: We show that restricting the transmitted information among components to discrete representations is a beneficial bottleneck.
Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation.
We extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication.
- Score: 85.3675647398994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has advanced from fully connected architectures to structured
models organized into components, e.g., the transformer composed of positional
elements, modular architectures divided into slots, and graph neural nets made
up of nodes. In structured models, an interesting question is how to conduct
dynamic and possibly sparse communication among the separate components. Here,
we explore the hypothesis that restricting the transmitted information among
components to discrete representations is a beneficial bottleneck. The
motivating intuition is human language in which communication occurs through
discrete symbols. Even though individuals have different understandings of what
a "cat" is based on their specific experiences, the shared discrete token makes
it possible for communication among individuals to be unimpeded by individual
differences in internal representation. To discretize the values of concepts
dynamically communicated among specialist components, we extend the
quantization mechanism from the Vector-Quantized Variational Autoencoder to
multi-headed discretization with shared codebooks and use it for
discrete-valued neural communication (DVNC). Our experiments show that DVNC
substantially improves systematic generalization in a variety of architectures
-- transformers, modular architectures, and graph neural networks. We also show
that the DVNC is robust to the choice of hyperparameters, making the method
very useful in practice. Moreover, we establish a theoretical justification of
our discretization process, proving that it has the ability to increase noise
robustness and reduce the underlying dimensionality of the model.
Related papers
- Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Knowledge Distillation Based Semantic Communications For Multiple Users [10.770552656390038]
We consider the semantic communication (SemCom) system with multiple users, where there is a limited number of training samples and unexpected interference.
We propose a knowledge distillation (KD) based system where Transformer based encoder-decoder is implemented as the semantic encoder-decoder and fully connected neural networks are implemented as the channel encoder-decoder.
Numerical results demonstrate that KD significantly improves the robustness and the generalization ability when applied to unexpected interference, and it reduces the performance loss when compressing the model size.
arXiv Detail & Related papers (2023-11-23T03:28:14Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Modeling Structure with Undirected Neural Networks [20.506232306308977]
We propose undirected neural networks, a flexible framework for specifying computations that can be performed in any order.
We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks.
arXiv Detail & Related papers (2022-02-08T10:06:51Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - TIME: A Transparent, Interpretable, Model-Adaptive and Explainable
Neural Network for Dynamic Physical Processes [0.0]
We present a fully convolutional architecture that captures the invariant structure of the domain to reconstruct the observable system.
Our intent is to learn coupled dynamic processes interpreted as deviations from true kernels representing isolated processes for model-adaptivity.
arXiv Detail & Related papers (2020-03-05T04:19:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.