The Gradient of Algebraic Model Counting
- URL: http://arxiv.org/abs/2502.18406v1
- Date: Tue, 25 Feb 2025 17:57:55 GMT
- Title: The Gradient of Algebraic Model Counting
- Authors: Jaron Maene, Luc De Raedt,
- Abstract summary: We show that the very same semiring perspective of algebraic model counting also applies to learning.<n>We show how cancellation and ordering properties of a semiring can be exploited for more memory-efficient backpropagation.
- Score: 9.742948699856427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algebraic model counting unifies many inference tasks on logic formulas by exploiting semirings. Rather than focusing on inference, we consider learning, especially in statistical-relational and neurosymbolic AI, which combine logical, probabilistic and neural representations. Concretely, we show that the very same semiring perspective of algebraic model counting also applies to learning. This allows us to unify various learning algorithms by generalizing gradients and backpropagation to different semirings. Furthermore, we show how cancellation and ordering properties of a semiring can be exploited for more memory-efficient backpropagation. This allows us to obtain some interesting variations of state-of-the-art gradient-based optimisation methods for probabilistic logical models. We also discuss why algebraic model counting on tractable circuits does not lead to more efficient second-order optimization. Empirically, our algebraic backpropagation exhibits considerable speed-ups as compared to existing approaches.
Related papers
- Explainable and Class-Revealing Signal Feature Extraction via Scattering Transform and Constrained Zeroth-Order Optimization [0.5893124686141783]
We propose a new method to extract discriminant and explainable features from a machine learning model.<n>We adopt zeroth-order optimization algorithms to search an input pattern that maximizes the class probability of a class of interest.<n>We demonstrate the effectiveness of our proposed method using a couple of synthetic time-series classification problems.
arXiv Detail & Related papers (2025-02-08T23:35:37Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - A Markovian Model for Learning-to-Optimize [4.112909937203119]
We present a probabilistic model for iterative algorithms with the use case of optimization algorithms in mind.
Based on this model, we present PAC-Bayesian generalization bounds for functions that are defined on the trajectory of the learned algorithm.
arXiv Detail & Related papers (2024-08-21T14:00:22Z) - Sparse Variational Student-t Processes [8.46450148172407]
Student-t Processes are used to model heavy-tailed distributions and datasets with outliers.
We propose a sparse representation framework to allow Student-t Processes to be more flexible for real-world datasets.
We evaluate two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle.
arXiv Detail & Related papers (2023-12-09T12:55:20Z) - A Metalearned Neural Circuit for Nonparametric Bayesian Inference [4.767884267554628]
Most applications of machine learning to classification assume a closed set of balanced classes.
This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution.
We present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network.
arXiv Detail & Related papers (2023-11-24T16:43:17Z) - Comparison of Single- and Multi- Objective Optimization Quality for
Evolutionary Equation Discovery [77.34726150561087]
Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions.
The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
arXiv Detail & Related papers (2023-06-29T15:37:19Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Dynamic programming by polymorphic semiring algebraic shortcut fusion [1.9405875431318445]
Dynamic programming (DP) is an algorithmic design paradigm for the efficient, exact solution of intractable, problems.
This paper presents a rigorous algebraic formalism for systematically deriving DP algorithms, based on semiring.
arXiv Detail & Related papers (2021-07-05T00:51:02Z) - DiffPrune: Neural Network Pruning with Deterministic Approximate Binary
Gates and $L_0$ Regularization [0.0]
Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness.
The contribution of this work is two-fold.
The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued random variable.
The second is a method for model selection by element-wise parameters with approximate binary gates that may be computed deterministically or multiplicationally and take on exact zero values.
arXiv Detail & Related papers (2020-12-07T13:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.