Rethinking Probabilistic Circuit Parameter Learning
- URL: http://arxiv.org/abs/2505.19982v2
- Date: Fri, 03 Oct 2025 23:51:39 GMT
- Title: Rethinking Probabilistic Circuit Parameter Learning
- Authors: Anji Liu, Zilei Shao, Guy Van den Broeck,
- Abstract summary: We introduce anemone, a new mini-batch EM algorithm for PCs.<n>Anemone consistently outperforms existing datasets in both convergence speed and final performance.
- Score: 47.13236967317809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Probabilistic Circuits (PCs) offer a computationally scalable framework for generative modeling, supporting exact and efficient inference of a wide range of probabilistic queries. While recent advances have significantly improved the expressiveness and scalability of PCs, effectively training their parameters remains a challenge. In particular, a widely used optimization method, full-batch Expectation-Maximization (EM), requires processing the entire dataset before performing a single update, making it ineffective for large datasets. Although empirical extensions to the mini-batch setting, as well as gradient-based mini-batch algorithms, converge faster than full-batch EM, they generally underperform in terms of final likelihood. We investigate this gap by establishing a novel theoretical connection between these practical algorithms and the general EM objective. Our analysis reveals a fundamental issue that existing mini-batch EM and gradient-based methods fail to properly regularize distribution changes, causing each update to effectively ``overfit'' the current mini-batch. Motivated by this insight, we introduce anemone, a new mini-batch EM algorithm for PCs. Anemone applies an implicit adaptive learning rate to each parameter, scaled by how much it contributes to the likelihood of the current batch. Across extensive experiments on language, image, and DNA datasets, anemone consistently outperforms existing optimizers in both convergence speed and final performance.
Related papers
- Probabilistic Optimality for Inference-time Scaling [11.92228840747636]
Inference-time scaling has emerged as a powerful technique for enhancing the reasoning performance of Large Language Models (LLMs)<n>We propose a probabilistic framework that formalizes the optimality of inference-time scaling under the assumption that parallel samples are independently and identically distributed (i.i.d.)<n>Within this framework, we derive a theoretical lower bound on the required number of samples to achieve a target performance level, providing the first principled guidance for compute-efficient scaling.
arXiv Detail & Related papers (2025-06-27T16:44:11Z) - UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize"<n>Our unrolling approach covers various statistical feature distributions and pre-training paradigms.<n>We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z) - FLOPS: Forward Learning with OPtimal Sampling [1.694989793927645]
gradient-based computation methods have recently gained focus for learning with only forward passes, also referred to as queries.<n> Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling.<n>We propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-08T12:16:12Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Mini-Batch Optimization of Contrastive Loss [13.730030395850358]
We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $binomNB$ mini-batches are selected.
We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches.
arXiv Detail & Related papers (2023-07-12T04:23:26Z) - Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation [44.53678257757108]
We propose a new BO method that can sub-linearly converge to the objective function's global optimum.
Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process.
We demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
arXiv Detail & Related papers (2023-06-12T03:35:45Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - A Data Driven Sequential Learning Framework to Accelerate and Optimize
Multi-Objective Manufacturing Decisions [1.5771347525430772]
This paper presents a novel data-driven Bayesian optimization framework that utilizes sequential learning to efficiently optimize complex systems.
The proposed framework is particularly beneficial in practical applications where acquiring data can be expensive and resource intensive.
It implies that the proposed data-driven framework can lead to similar manufacturing decisions with reduced costs and time.
arXiv Detail & Related papers (2023-04-18T20:33:08Z) - Adaptive Sparse Gaussian Process [0.0]
We propose the first adaptive sparse Gaussian Process (GP) able to address all these issues.
We first reformulate a variational sparse GP algorithm to make it adaptive through a forgetting factor.
We then propose updating a single inducing point of the sparse GP model together with the remaining model parameters every time a new sample arrives.
arXiv Detail & Related papers (2023-02-20T21:34:36Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization [39.824086260578646]
This paper presents a novel trusted-maximizers entropy search (TES) acquisition function.
It measures how much an input contributes to the information gain on a query over a finite set of trusted maximizers.
arXiv Detail & Related papers (2021-07-30T07:25:07Z) - Batch Sequential Adaptive Designs for Global Optimization [5.825138898746968]
Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems.
For those multiple points EGO methods, the heavy computation and points clustering are the obstacles.
In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method.
The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12.
arXiv Detail & Related papers (2020-10-21T01:11:35Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.