EdgeFusion: On-Device Text-to-Image Generation
- URL: http://arxiv.org/abs/2404.11925v1
- Date: Thu, 18 Apr 2024 06:02:54 GMT
- Title: EdgeFusion: On-Device Text-to-Image Generation
- Authors: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim,
- Abstract summary: We develop a compact SD variant, BK-SDM, for text-to-image generation.
We achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.
- Score: 3.3345550849564836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.
Related papers
- Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution [7.920423405957888]
We propose a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks.
The proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step.
Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models.
arXiv Detail & Related papers (2025-03-25T09:56:21Z) - Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation [5.887986127737718]
We introduce BLIP2IDC, an adaptation of BLIP2 to the Image Difference Captioning (IDC) task at low computational cost.
We show it outperforms two-streams approaches by a significant margin on real-world IDC datasets.
We also propose to use synthetic augmentation to improve the performance of IDC models in an agnostic fashion.
arXiv Detail & Related papers (2024-12-20T14:32:56Z) - PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [87.89013794655207]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.
We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.
Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z) - ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution [28.945663118445037]
Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations.
We introduce ConsisSR to handle both semantic and pixel-level consistency.
arXiv Detail & Related papers (2024-10-17T17:41:52Z) - AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation [12.564266865237343]
Latent diffusion models (LDMs) often experience significant structural distortions when directly generating high-resolution (HR) images.
We propose an Attentive and Progressive LDM (AP-LDM) aimed at enhancing HR image quality while accelerating the generation process.
AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising.
arXiv Detail & Related papers (2024-10-08T13:56:28Z) - Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging [8.819370643243012]
Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for capturing three-dimensional multispectral images (MSIs)
Current state-of-the-art methods, predominantly end-to-end, face limitations in reconstructing high-frequency details.
This paper introduces a novel one-step Diffusion Probabilistic Model within a self-supervised adaptation framework for Snapshot Compressive Imaging.
arXiv Detail & Related papers (2024-09-11T17:02:10Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - DivCon: Divide and Conquer for Progressive Text-to-Image Generation [0.0]
Diffusion-driven text-to-image (T2I) generation has achieved remarkable advancements.
layout is employed as an intermedium to bridge large language models and layout-based diffusion models.
We introduce a divide-and-conquer approach which decouples the T2I generation task into simple subtasks.
arXiv Detail & Related papers (2024-03-11T03:24:44Z) - MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval [7.233106731197739]
We propose a Multi-teacher Cross-modality Alignment Distillation (MCAD) technique to integrate the advantages of single- and dual-stream models.
We implement a lightweight CLIP model on Snapdragon/Dimensity chips with only $sim$100M running memory and $sim$8.0ms search latency.
arXiv Detail & Related papers (2023-10-30T15:38:43Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and
Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms.
As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.