Related papers: Discrete Latent Structure in Neural Networks

Discrete Latent Structure in Neural Networks

URL: http://arxiv.org/abs/2301.07473v1
Date: Wed, 18 Jan 2023 12:30:44 GMT
Title: Discrete Latent Structure in Neural Networks
Authors: Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, Andr\'e F. T. Martins
Abstract summary: This text explores three broad strategies for learning with discrete latent structure. We show how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
Score: 21.890439357275696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.

Related papers

Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation [3.8073142980733]
This thesis introduces quantitative methods for identifying systematic structure in a mapping between spaces.<n>I identify structural primitives present in a mapping, along with information theoretics of each.<n>I also introduce a novel, performant, approach to estimating the entropy of vector space, that allows this analysis to be applied to models ranging in size from 1 million to 12 billion parameters.
arXiv Detail & Related papers (2025-05-29T19:27:50Z)
Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning [40.84344912259233]
We identify several beneficial forms of procedural data, together with specific algorithmic reasoning skills that improve in small transformers.<n>Our core finding is that different procedural rules instil distinct but complementary inductive structures in the model.<n>Most interestingly, the structures induced by multiple rules can be composed to jointly impart multiple capabilities.
arXiv Detail & Related papers (2025-05-28T12:50:09Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Learning with Hidden Factorial Structure [2.474908349649168]
Recent advances suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality. We present a controlled experimental framework to test whether neural networks can indeed exploit such "hidden factorial structures"
arXiv Detail & Related papers (2024-11-02T22:32:53Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons [2.403231673869682]
We present a guide to the foundations of learning Dynamic Bayesian Networks (DBNs) from data. We present the formalism for a generic as well as a set of common types of DBNs for particular variable distributions.
arXiv Detail & Related papers (2024-06-25T14:28:17Z)
Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations. We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities. Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z)
Experimental Observations of the Topology of Convolutional Neural Network Activations [2.4235626091331737]
Topological data analysis provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification.
arXiv Detail & Related papers (2022-12-01T02:05:44Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models. Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z)
Learning Probabilistic Structural Representation for Biomedical Image Segmentation [37.07198480786721]
We propose the first deep learning method to learn a structural representation. We empirically demonstrate the strength of our method, i.e., generating true structures rather than pixel-maps with better topological integrity.
arXiv Detail & Related papers (2022-06-03T06:00:26Z)
Learning compositional structures for semantic graph parsing [81.41592892863979]
We show how AM dependency parsing can be trained directly on a neural latent-variable model. Our model picks up on several linguistic phenomena on its own and achieves comparable accuracy to supervised training.
arXiv Detail & Related papers (2021-06-08T14:20:07Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.