Discrete Latent Structure in Neural Networks
- URL: http://arxiv.org/abs/2301.07473v1
- Date: Wed, 18 Jan 2023 12:30:44 GMT
- Title: Discrete Latent Structure in Neural Networks
- Authors: Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova,
Andr\'e F. T. Martins
- Abstract summary: This text explores three broad strategies for learning with discrete latent structure.
We show how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
- Score: 21.890439357275696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many types of data from fields including natural language processing,
computer vision, and bioinformatics, are well represented by discrete,
compositional structures such as trees, sequences, or matchings. Latent
structure models are a powerful tool for learning to extract such
representations, offering a way to incorporate structural bias, discover
insight about the data, and interpret decisions. However, effective training is
challenging, as neural networks are typically designed for continuous
computation.
This text explores three broad strategies for learning with discrete latent
structure: continuous relaxation, surrogate gradients, and probabilistic
estimation. Our presentation relies on consistent notations for a wide range of
models. As such, we reveal many new connections between latent structure
learning strategies, showing how most consist of the same small set of
fundamental building blocks, but use them differently, leading to substantially
different applicability and properties.
Related papers
- Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons [2.403231673869682]
We present a guide to the foundations of learning Dynamic Bayesian Networks (DBNs) from data.
We present the formalism for a generic as well as a set of common types of DBNs for particular variable distributions.
arXiv Detail & Related papers (2024-06-25T14:28:17Z) - Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation [4.894417113725933]
We propose a training strategy which enables a 3D LiDAR semantic segmentation model to learn structural relationships between the different classes through abstraction.
We show, how this training strategy not only improves the model's confidence calibration, but also preserves additional information for downstream tasks like fusion, prediction and planning.
arXiv Detail & Related papers (2024-04-09T08:49:01Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Experimental Observations of the Topology of Convolutional Neural
Network Activations [2.4235626091331737]
Topological data analysis provides compact, noise-robust representations of complex structures.
Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture.
In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification.
arXiv Detail & Related papers (2022-12-01T02:05:44Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Joint Language Semantic and Structure Embedding for Knowledge Graph
Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information.
Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models.
Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z) - Learning Probabilistic Structural Representation for Biomedical Image
Segmentation [37.07198480786721]
We propose the first deep learning method to learn a structural representation.
We empirically demonstrate the strength of our method, i.e., generating true structures rather than pixel-maps with better topological integrity.
arXiv Detail & Related papers (2022-06-03T06:00:26Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Learning compositional structures for semantic graph parsing [81.41592892863979]
We show how AM dependency parsing can be trained directly on a neural latent-variable model.
Our model picks up on several linguistic phenomena on its own and achieves comparable accuracy to supervised training.
arXiv Detail & Related papers (2021-06-08T14:20:07Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.