On the Wasserstein Gradient Flow Interpretation of Drifting Models
Abstract Overview
This paper analyzes Generative Modeling via Drifting (GMD) through the framework of Wasserstein Gradient Flows (WGFs), treating the drifting procedure as targeting a fixed point in probability space rather than simulating the full flow trajectory. The authors show that a score-difference drifting construction corresponds to finding the fixed point of a WGF on the KL divergence with Parzen-smoothed densities, and they correct a prior claim by Cao et al. regarding the exact velocity field expression. They then analyze the practical algorithm used by Deng et al. by proposing a closely related "Sinkhorn Proxy" based on a one-shot approximation to the Sinkhorn transport plan, proving both consistency properties and fundamental limitations of this approach. The paper further extends the drifting idea to other WGF objectives including MMD, sliced-Wasserstein distance, and f-divergence critics, with synthetic experiments comparing all approaches.
Novelty
The paper's main novelty is providing a rigorous theoretical reinterpretation of drifting models as fixed-point methods for specific Wasserstein gradient flows, while carefully distinguishing between the score-difference/KL view and the practically implemented Sinkhorn-like procedure. It introduces and formally analyzes a one-shot Sinkhorn Proxy, proving both that its velocity is zero if and only if distributions match (consistency) and that its velocity field does not generally correspond to a Wasserstein gradient flow and fails to transport mass between well-separated modes.
Results
The authors prove that the score-difference drifting field corresponds to the fixed point of a KL-based WGF with smoothed densities, correcting a prior claim about the exact velocity expression. For the Sinkhorn Proxy with Gaussian kernel, they prove consistency (velocity is zero iff p=q) but also show the velocity field is not the gradient of any distributional loss unless a highly restrictive condition holds, and demonstrate analytically that it fails to move mass between distant modes. Synthetic experiments on 2D datasets show all flow objectives achieve similar best-case performance when hyperparameters are tuned, with the proposed Sinkhorn Proxy tolerating smaller regularization values better than Deng et al.'s Algorithm 2.
Key Points
- The paper interprets the score-difference drifting formulation as directly targeting the stationary point of a Wasserstein gradient flow on KL divergence with Parzen-smoothed densities, and corrects a prior claim by Cao et al. regarding the exact velocity field expression for KL between smoothed densities.
- A Sinkhorn Proxy is derived as a one-shot approximation to the Sinkhorn transport plan for the practical drifting algorithm; it is proven consistent (zero velocity iff p=q) but shown to lack gradient structure and to fail at transporting mass between well-separated modes, unlike a true optimal transport flow.
- The framework is extended to other WGF objectives including MMD, sliced-Wasserstein, and dual f-divergence critics, with synthetic experiments demonstrating that all flows achieve comparable best-case performance but differ in sensitivity to hyperparameters, and the Sinkhorn Proxy shows better tolerance to small regularization than Deng et al.'s Algorithm 2.