Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution
- URL: http://arxiv.org/abs/2602.24240v1
- Date: Fri, 27 Feb 2026 18:13:31 GMT
- Title: Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution
- Authors: Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, Wang Zhang,
- Abstract summary: Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative sampling.<n>We propose GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR.
- Score: 14.52346301984322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative sampling. While recent distillation approaches leveraging large-scale Text-to-Image (T2I) priors have enabled one-step generation, they are typically hindered by prohibitive parameter counts and the inherent capability bounds imposed by teacher models. As a lightweight alternative, Consistency Models offer efficient inference but struggle with two critical limitations: the accumulation of consistency drift inherent to transitive training, and a phenomenon we term "Geometric Decoupling" - where the generative trajectory achieves pixel-wise alignment yet fails to preserve structural coherence. To address these challenges, we propose GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR. Specifically, we introduce a Trajectory Alignment (TA) strategy to rectify the tangent vector field via full-path projection, and a Dual-Reference Structural Rectification (DRSR) mechanism to enforce strict structural constraints. Extensive experiments verify that GTASR delivers superior performance over representative baselines while maintaining minimal latency. The code and model will be released at https://github.com/Blazedengcy/GTASR.
Related papers
- On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective [2.268525139011456]
We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting.<n>We prove that while GD successfully converges to an optimal robustness margin, this convergence occurs at a prohibitively slow rate.<n>Our theoretical guarantees are derived via a rigorous analysis of the GD trajectories across the distinct activation patterns of the model.
arXiv Detail & Related papers (2026-03-02T17:13:33Z) - TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR [10.820638016337869]
We propose GeoRA, which exploits the anisotropic and compressible nature of RL update subspaces.<n>GeoRA mitigates optimization bottlenecks caused by geometric misalignment.<n>It consistently outperforms established low-rank baselines on key mathematical benchmarks.
arXiv Detail & Related papers (2026-01-14T10:41:34Z) - Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - SCEESR: Semantic-Control Edge Enhancement for Diffusion-Based Super-Resolution [0.8122270502556375]
Real-world image super-resolution must handle complex degradations and inherent reconstruction ambiguities.<n>One-step diffusion models offer speed but often produce structural inaccuracies due to distillation artifacts.<n>We propose a novel SR framework that enhances a one-step diffusion model using a ControlNet mechanism for semantic edge guidance.
arXiv Detail & Related papers (2025-10-22T06:06:01Z) - LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution [24.44080642253128]
Generative models for Image Super-Resolution (SR) are increasingly powerful, yet their reliance on self-attention's quadratic complexity (O(N2)) creates a major computational bottleneck.<n> Linear Attention offers an O(N) solution, but its promise for photorealistic SR has remained largely untapped.<n>This paper introduces LinearSR, a holistic framework that, for the first time, systematically overcomes these critical hurdles.
arXiv Detail & Related papers (2025-10-09T19:41:51Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.