Related papers: Two-dimensional Parallel Tempering for Constrained Optimization

Related papers

Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z)
Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking [14.561556728044918]
In system-level evaluations, a global parallelism level N is allocated to all samples, typically set large to maximize overall dataset accuracy.<n>Some samples can achieve comparable performance with a smaller N' N, causing budget redundancy.<n>We formalize and quantify the overscaling curse, showing its universality and severity in practice, and analyze its trigger mechanism.
arXiv Detail & Related papers (2026-01-29T12:22:45Z)
Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z)
Adaptive Probability Flow Residual Minimization for High-Dimensional Fokker-Planck Equations [14.22534820071447]
Solving high-dimensional Fokker-Planck equations is a challenge in computational physics and dynamics.<n>Existing deep learning approaches, such as Physics-Informed Neural Networks, face computational challenges as dimensionality increases.<n>We propose the Adaptive Probability Flow Residual Minimization (A-PFRM) method.
arXiv Detail & Related papers (2025-12-22T09:31:31Z)
db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism [14.406306253079515]
Scaling Diffusion Transformer (DiT) inference via sequence parallelism is critical for reducing latency in visual generation.<n>We formalize a sparse imbalance ratio to quantify the imbalance, and propose db-SP, a sparsity-aware sequence parallelism technique.<n>We show that db-SP delivers an end-to-end speedup of 1.25x and an attention-specific speedup of 1.40x over state-of-the-art sequence parallel methods.
arXiv Detail & Related papers (2025-11-28T11:55:46Z)
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models [99.6720868215076]
We introduce ThreadWeaver, a framework for adaptive parallel reasoning.<n> ThreadWeaver achieves accuracy on par with popular sequential reasoning models of comparable size.<n>We show that ThreadWeaver delivers up to 1.53x average speedup in token latency.
arXiv Detail & Related papers (2025-11-24T18:55:59Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z)
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation [34.14174796390669]
Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference.<n>Existing PTQ methods suffer from severe performance degradation under extreme low-bit settings.<n>We propose LRQ-DiT, an efficient and accurate PTQ framework.
arXiv Detail & Related papers (2025-08-05T14:16:11Z)
FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision Computing [6.987672546471471]
A Quadratic Assignment Problem (QAP) seeks to establish node correspondences between two graphs.<n>We develop a theoretically grounded mixed-precision architecture to achieve substantial acceleration.<n>FRAM achieves up to 370X speedup over its CPU-FP64 counterpart, with negligible loss in solution accuracy.
arXiv Detail & Related papers (2025-07-26T07:35:09Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction [36.239648954658534]
Time series forecasting requires architectures that simultaneously achieve three competing objectives.<n>We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges.<n> Comprehensive experiments demonstrate that AutoHFormer 10.76X faster training and 6.06X memory reduction compared to PatchTST on P08.
arXiv Detail & Related papers (2025-06-19T03:47:04Z)
More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2FOSGD), which integrates the Two-Scale Effective Dimension (2SED) with FOSGD.<n>By tracking sensitivity and effective dimensionality, 2SEDFOSGD dynamically modulates the exponent to mitigate sluggish oscillations and hasten convergence.
arXiv Detail & Related papers (2025-05-05T19:27:36Z)
Effective Dimension Aware Fractional-Order Stochastic Gradient Descent for Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2SEDFOSGD) to adapt the fractional exponent in a data-driven manner.<n>Theoretically, this approach preserves the advantages of fractional memory without the sluggish or unstable behavior observed in na"ive fractional SGD.
arXiv Detail & Related papers (2025-03-17T22:57:37Z)
Pushing the Boundary of Quantum Advantage in Hard Combinatorial Optimization with Probabilistic Computers [0.4969640751053581]
We show that p-computers can surpass state-of-the-art quantum annealers in solving hard optimization problems.<n>We show that these algorithms are readily implementable in modern hardware thanks to the mature semiconductor technology.<n>Our results raise the bar for a practical quantum advantage in optimization and present p-computers as scalable, energy-efficient hardware.
arXiv Detail & Related papers (2025-03-13T12:24:13Z)
Rank Reduction Autoencoders [3.180674374101366]
We introduce a new class of deterministic autoencoders, Rank Reduction Autoencoders (RRAEs)<n>In RRAEs, the bottleneck is defined by the rank of the latent matrix, thereby alleviating the dependence of the encoder/decoder architecture on the bottleneck size.<n>We empirically demonstrate that both RRAEs and aRRAEs are stable, scalable, and reliable.
arXiv Detail & Related papers (2024-05-22T20:33:09Z)
SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference. We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z)
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference [3.005912820808423]
Inference on autoregressive models harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. We propose Flover -- a temporal fusion framework for efficiently inferring multiple requests in parallel. By orchestrating the token-level parallelism, Flover exhibits hardware optimal efficiency and significantly spares the system resources.
arXiv Detail & Related papers (2023-05-22T20:58:09Z)
Accelerated First-Order Optimization under Nonlinear Constraints [61.98523595657983]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.<n>An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning. We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both. Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.