Related papers: Quasi-Newton Quasi-Monte Carlo for variational Bayes

Related papers

Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning [11.356198488445488]
We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency.<n>On MATH500, Power-SMC matches or exceeds MH power sampling while reducing latency from $16$--$28times$ to $1.4$--$3.3times$ over baseline decoding.
arXiv Detail & Related papers (2026-02-10T20:31:40Z)
CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning [62.56541355300587]
We introduce a general test-time calibration framework that adaptively modifies the model toward high-reward reasoning paths.<n>Within this framework, we propose CarBoN, a two-phase method that first explores the solution space and then learns a calibration of the logits.<n>Experiments on MATH-500 and AIME-2024 show that CarBoN improves efficiency, with up to $4times$ fewer rollouts to reach the same accuracy.
arXiv Detail & Related papers (2025-10-17T14:04:37Z)
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation [59.01400190971061]
Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact.<n>We argue that promoting diversity among samples is not essential; what matters is to select samples thatmaximise diversity in model responses.<n>Our method, $textbfDiversifying Sample Condensation (DISCO)$, selects the top-k samples with the greatest model disagreements.
arXiv Detail & Related papers (2025-10-09T08:53:59Z)
Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control. The control is learned via a least squares problem by trying to fit a matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z)
Training Chain-of-Thought via Latent-Variable Inference [30.21067593018967]
Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a chain-of-thought'' prompt. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers. We propose a fine-tuning strategy that tries to maximize the emphmarginal log-likelihood of generating a correct answer using CoT prompting.
arXiv Detail & Related papers (2023-11-28T17:47:32Z)
Parallelized Acquisition for Active Learning using Monte Carlo Sampling [0.0]
Recent attention has been directed towards the use of emulators of the posterior based on Gaussian Process (GP) regression. We show how to generate a Monte Carlo sample of the posterior using almost-embarrassingly parallel sequential samplers. The resulting nearly-sorted Monte Carlo sample is used to generate a batch of candidates ranked according to their sequentially conditioned acquisition function values.
arXiv Detail & Related papers (2023-05-30T17:57:34Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL) We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z)
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z)
Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction [0.0]
We introduce a target-vector optimization scheme that considers all $K$ contributions of the model function and that is specifically suited for parameter reconstruction problems. It also enables to determine accurate uncertainty estimates with very few observations of the actual model function.
arXiv Detail & Related papers (2022-02-23T15:13:32Z)
Langevin Monte Carlo: random coordinate descent and variance reduction [7.464874233755718]
Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. We investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC.
arXiv Detail & Related papers (2020-07-26T18:14:36Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Operator Sampling for Shot-frugal Optimization in Variational Algorithms [0.860968116787044]
We introduce a strategy for reducing the number of measurements (i.e., shots) by randomly sampling operators $h_i$ from the overall $H = sum_i c_i h_i$. We implement this and other Samplings to find the ground states of molecules H$$, LiH, and BeH$, without and with quantum hardware noise, and Rosalin outperforms other Samplings in most cases.
arXiv Detail & Related papers (2020-04-14T01:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.