Fast & Efficient Normalizing Flows and Applications of Image Generative Models
- URL: http://arxiv.org/abs/2512.04039v1
- Date: Wed, 03 Dec 2025 18:29:03 GMT
- Title: Fast & Efficient Normalizing Flows and Applications of Image Generative Models
- Authors: Sandeep Nagar,
- Abstract summary: thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges.<n>The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, 2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.
Related papers
- STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models [11.965535230928372]
Store is a unified and scalable token-based ranking framework built upon three core innovations.<n>Our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput)
arXiv Detail & Related papers (2025-11-24T06:20:02Z) - 4KDehazeFlow: Ultra-High-Definition Image Dehazing via Flow Matching [47.857232695201645]
4KDehazeFlow is a novel method based on Flow Matching and the Haze-Aware vector field.<n>It provides efficient data-driven adaptive nonlinear color transformation for high-quality dehazing.<n>It delivers a 2dB PSNR increase and better performance in dense haze and color fidelity.
arXiv Detail & Related papers (2025-11-12T07:16:52Z) - Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications [99.72917069918485]
We propose a novel sparse model inversion strategy to speed up existing dense inversion methods.<n>Specifically, we invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations.
arXiv Detail & Related papers (2025-10-31T05:14:36Z) - ReSeFlow: Rectifying SE(3)-Equivariant Policy Learning Flows [7.360373380580255]
We introduce the rectification to the SE(3)-diffusion models and propose the ReSeFlow, providing fast, geodesic-consistent, least-computational policy generation.<n>We find that the proposed ReSeFlow with only one inference step can achieve better performance with lower geodesic distance than the baseline methods.
arXiv Detail & Related papers (2025-09-20T06:32:36Z) - Harnessing Input-Adaptive Inference for Efficient VLN [13.847596428283861]
An emerging paradigm in vision-and-language navigation (VLN) is the use of history-aware multi-modal transformer models.<n>We propose a novel input-adaptive navigation method to enhance VLN model efficiency.
arXiv Detail & Related papers (2025-08-12T18:05:33Z) - Solving Inverse Problems with FLAIR [68.87167940623318]
We present FLAIR, a training-free variational framework that leverages flow-based generative models as prior for inverse problems.<n>Results on standard imaging benchmarks demonstrate that FLAIR consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity.
arXiv Detail & Related papers (2025-06-03T09:29:47Z) - Flow-GRPO: Training Flow Matching Models via Online RL [80.62659379624867]
We propose Flow-GRPO, the first method to integrate online policy reinforcement learning into flow matching models.<n>Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation into an equivalent Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps.
arXiv Detail & Related papers (2025-05-08T17:58:45Z) - NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers [10.84639914909133]
Flow-based Transformer models have achieved state-of-the-art image generation performance, but often suffer from high inference latency and computational cost.<n>We propose Bridged Progressive Rectified Flow Transformers (NAMI), which decompose the generation process across temporal, spatial, and architectural demensions.
arXiv Detail & Related papers (2025-03-12T10:38:58Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Variational Bayes image restoration with compressive autoencoders [6.689746581015932]
Regularization of inverse problems is paramount to importance in computational imaging.<n>In this work, we first propose to use variational autoencoders instead of state-of-the-art generative models.<n>As a second contribution, we introduce the Variational Bayes Latent Estimation (VBLE) algorithm, which performs latent estimation within variational inference.
arXiv Detail & Related papers (2023-11-29T15:49:31Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Uncertainty quantification and inverse modeling for subsurface flow in
3D heterogeneous formations using a theory-guided convolutional
encoder-decoder network [5.018057056965207]
We build surrogate models for dynamic 3D subsurface single-phase flow problems with multiple vertical producing wells.
The surrogate model provides efficient pressure estimation of the entire formation at any timestep.
The well production rate or bottom hole pressure can then be determined based on Peaceman's formula.
arXiv Detail & Related papers (2021-11-14T10:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.