A Compositional Atlas of Tractable Circuit Operations: From Simple
Transformations to Complex Information-Theoretic Queries
- URL: http://arxiv.org/abs/2102.06137v1
- Date: Thu, 11 Feb 2021 17:26:32 GMT
- Title: A Compositional Atlas of Tractable Circuit Operations: From Simple
Transformations to Complex Information-Theoretic Queries
- Authors: Antonio Vergari, YooJung Choi, Anji Liu, Stefano Teso, Guy Van den
Broeck
- Abstract summary: We show how complex inference scenarios for machine learning can be represented in terms of tractable modular operations over circuits.
We derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.
- Score: 44.36335714431731
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Circuit representations are becoming the lingua franca to express and reason
about tractable generative and discriminative models. In this paper, we show
how complex inference scenarios for these models that commonly arise in machine
learning -- from computing the expectations of decision tree ensembles to
information-theoretic divergences of deep mixture models -- can be represented
in terms of tractable modular operations over circuits. Specifically, we
characterize the tractability of a vocabulary of simple transformations --
sums, products, quotients, powers, logarithms, and exponentials -- in terms of
sufficient structural constraints of the circuits they operate on, and present
novel hardness results for the cases in which these properties are not
satisfied. Building on these operations, we derive a unified framework for
reasoning about tractable models that generalizes several results in the
literature and opens up novel tractable inference scenarios.
Related papers
- Toward Understanding In-context vs. In-weight Learning [50.24035812301655]
We identify simplified distributional properties that give rise to the emergence and disappearance of in-context learning.
We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.
arXiv Detail & Related papers (2024-10-30T14:09:00Z) - Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.
CAP intervenes in model activations through constituent-based pooling at various model levels.
arXiv Detail & Related papers (2024-10-16T18:10:50Z) - Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models [22.89563355840371]
We identify and compare circuits responsible for ten modular string-edit operations within a language model.
Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
arXiv Detail & Related papers (2024-10-02T11:36:45Z) - What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)? [12.53042167016897]
We introduce a modular "Lego block" approach to build tensorized circuit architectures.
This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline.
arXiv Detail & Related papers (2024-09-12T11:32:01Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials [29.09237503747052]
Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers.
We show that the transferability among the models grokked with each operation can be only limited to specific combinations.
Some multi-task mixtures may lead to co-grokking, where grokking simultaneously happens for all the tasks.
arXiv Detail & Related papers (2024-02-26T16:48:12Z) - Structured World Representations in Maze-Solving Transformers [3.75591091941815]
This work focuses on the abstractions formed by small transformer models.
We find evidence for the consistent emergence of structured internal representations of maze topology and valid paths.
We also take steps towards deciphering the circuity of path-following by identifying attention heads.
arXiv Detail & Related papers (2023-12-05T08:24:26Z) - Query Structure Modeling for Inductive Logical Reasoning Over Knowledge
Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs.
It encodes linearized query structures and entities using pre-trained language models to find answers.
We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.