Related papers: Critically Damped Third-Order Langevin Dynamics

Related papers

Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model [14.755426957558868]
High-dimensional and incompleteHDI datasets have become ubiquitous across various real-world applications.<n>We propose a two-fold approach to mitigate information instabilities.<n>The ACRS HDI demonstrate that the ALF represents higher representation than the faster advancing (SACR) models.
arXiv Detail & Related papers (2025-07-03T03:15:54Z)
Critically-Damped Higher-Order Langevin Dynamics [6.259381563339797]
Critical damping has been successfully introduced in Critically-Damped Langevin Dynamics (CLD) and Critically-Damped Third-Order Langevin Dynamics (TOLD++)<n>The proposed line of work generalizes Higher-Order Langevin Dynamics (HOLD), a recent state-of-the-art diffusion method, by introducing the concept of critical damping from systems analysis.
arXiv Detail & Related papers (2025-06-26T19:50:53Z)
Comba: Improving Bilinear RNNs with Closed-loop Control [19.761486052705017]
We introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models.<n>We propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections.<n>We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus.
arXiv Detail & Related papers (2025-06-03T05:44:50Z)
Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence [11.400431211239958]
Diffusion models have emerged as powerful tools for generative modeling. We propose a control framework for fine-tuning diffusion models. We show that PI-FT achieves global convergence at a linear rate.
arXiv Detail & Related papers (2024-12-24T04:55:46Z)
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability. DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference. We propose Skip-DiT, a novel DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z)
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model [66.91323540178739]
Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior. We revisit SR from a novel information-theoretic perspective and find that sequential modeling methods fail to adequately capture randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests.
arXiv Detail & Related papers (2024-10-31T14:52:01Z)
LLaCA: Multimodal Large Language Continual Assistant [59.585544987096974]
Multimodal Continual Instruction Tuning (MCIT) is adopted to continually instruct MLLMs to follow human intent in sequential datasets. Existing gradient update would heavily destroy the tuning performance on previous datasets. We propose a method called Multimodal Large Language Continual Assistant (LLaCA) to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Accurate deep learning-based filtering for chaotic dynamics by identifying instabilities without an ensemble [0.5936407204316615]
We investigate the ability to discover data assimilation schemes meant for chaotic dynamics with deep learning. The focus is on learning the analysis step of DA, from state trajectories and their observations, using a simple residual convolutional neural network.
arXiv Detail & Related papers (2024-08-08T19:44:57Z)
Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective [63.60312929416228]
textbftextitAttraos incorporates chaos theory into long-term time series forecasting. We show that Attraos outperforms various LTSF methods on mainstream datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.
arXiv Detail & Related papers (2024-02-18T05:35:01Z)
Dynamic Residual Classifier for Class Incremental Learning [4.02487511510606]
With imbalanced sample numbers between old and new classes, the learning can be biased. Existing CIL methods exploit the longtailed (LT) recognition techniques, e.g., the adjusted losses and the data re-sampling methods. A novel Dynamic Residual adaptation (DRC) is proposed to handle this challenging scenario.
arXiv Detail & Related papers (2023-08-25T11:07:11Z)
Langevin Autoencoders for Learning Deep Latent Variable Models [27.60436426879683]
We present a new deep latent variable model named the Langevin autoencoder (LAE) Based on the ALD, we also present a new deep latent variable model named the Langevin autoencoder (LAE)
arXiv Detail & Related papers (2022-09-15T04:26:22Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)
Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows [3.07869141026886]
We advocate control as a finite time alternative to popular steady-state methods such as gradient Langevin dynamics (SGLD) We discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
arXiv Detail & Related papers (2021-11-20T03:51:18Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.