FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
- URL: http://arxiv.org/abs/2602.12829v1
- Date: Fri, 13 Feb 2026 11:32:10 GMT
- Title: FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
- Authors: Lei Lv, Yunfei Li, Yu Luo, Fuchun Sun, Xiao Ma,
- Abstract summary: We propose a framework that regulates policyity by penalizing the kinetic energy of the velocity field.<n>We derive an energy-regularized policy scheme and a practical off-policy algorithm that automatically tunes the kinetic energy.
- Score: 28.98935867615678
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. To address this, we propose Field Least-Energy Actor-Critic (FLAC), a likelihood-free framework that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. Our key insight is to formulate policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform). Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. In this framework, kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution. Building on this view, we derive an energy-regularized policy iteration scheme and a practical off-policy algorithm that automatically tunes the kinetic energy via a Lagrangian dual mechanism. Empirically, FLAC achieves superior or comparable performance on high-dimensional benchmarks relative to strong baselines, while avoiding explicit density estimation.
Related papers
- Entropy-Controlled Flow Matching [0.08460698440162889]
We propose a constrained variational principle over continuity-equation paths enforcing a global entropy-rate budget d/dt H(mu_t) >= -lambda.<n>We obtain certificate-style mode-coverage and density-floor guarantees with Lipschitz, and construct near-optimal counterexamples for unconstrained flow matching.
arXiv Detail & Related papers (2026-02-25T06:07:01Z) - Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching [8.665369041430969]
Flow Matching (FM) enables one-step generation, but integrating it into Entropy Reinforcement Learning (MaxEnt RL) is challenging.<n>We propose textbfFlow-based textbfLog-likelihood-textbfAware textbfMaximum textbfEntropy RL (textbfFLAME), a principled framework that addresses these challenges.
arXiv Detail & Related papers (2026-02-02T03:54:11Z) - Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL [56.085103402298905]
We propose a trajectory entropy-constrained reinforcement learning (TECRL) framework to address these two challenges.<n>Within this framework, we first separately learn two Q-functions, one associated with reward and the other with entropy, ensuring clean and stable value targets unaffected by temperature updates.<n>We develop a practical off-policy algorithm, DSAC-E, by extending the state-of-the-art distributional soft actor-critic with three refinements.
arXiv Detail & Related papers (2025-10-25T09:17:47Z) - The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models [99.98293908799731]
This paper aims to overcome a major obstacle in scaling RL for reasoning with LLMs, namely the collapse of policy entropy.<n>In practice, we establish a transformation equation R=-a*eH+b between entropy H and downstream performance R.<n>We propose two simple yet effective techniques, namely Clip-Cov and KL-Cov, which clip and apply KL penalty to tokens with high covariances respectively.
arXiv Detail & Related papers (2025-05-28T17:38:45Z) - DIME:Diffusion-Based Maximum Entropy Reinforcement Learning [38.17326719163195]
Diffusion-Based Maximum Entropy RL (DIME)<n>emphDIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective.<n>Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL.
arXiv Detail & Related papers (2025-02-04T13:37:14Z) - Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation [0.276240219662896]
A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy.
This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes.
This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings.
arXiv Detail & Related papers (2024-07-25T15:48:24Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Manipulating the Dynamics of a Fermi Resonance with Light. A Direct
Optimal Control Theory Approach [0.0]
Direct optimal control theory for quantum dynamical problems presents itself as an interesting alternative to the traditional indirect optimal control.
We extend the application of the method to the case of exact wavepacket propagation using the example of a generic Fermi-resonance model.
arXiv Detail & Related papers (2021-08-27T14:30:03Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.