T2CI-GAN: Text to Compressed Image generation using Generative
Adversarial Network
- URL: http://arxiv.org/abs/2210.03734v1
- Date: Sat, 1 Oct 2022 09:26:25 GMT
- Title: T2CI-GAN: Text to Compressed Image generation using Generative
Adversarial Network
- Authors: Bulla Rajesh and Nandakishore Dusa and Mohammed Javed and Shiv Ram
Dubey and P. Nagabhushan
- Abstract summary: In practice, most of the visual data are processed and transmitted in the compressed representation form.
The proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs)
The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions.
The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions.
- Score: 9.657133242509671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of generating textual descriptions for the visual data has gained
research attention in the recent years. In contrast to that the problem of
generating visual data from textual descriptions is still very challenging,
because it requires the combination of both Natural Language Processing (NLP)
and Computer Vision techniques. The existing methods utilize the Generative
Adversarial Networks (GANs) and generate the uncompressed images from textual
description. However, in practice, most of the visual data are processed and
transmitted in the compressed representation. Hence, the proposed work attempts
to generate the visual data directly in the compressed representation form
using Deep Convolutional GANs (DCGANs) to achieve the storage and computational
efficiency. We propose GAN models for compressed image generation from text.
The first model is directly trained with JPEG compressed DCT images (compressed
domain) to generate the compressed images from text descriptions. The second
model is trained with RGB images (pixel domain) to generate JPEG compressed DCT
representation from text descriptions. The proposed models are tested on an
open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG
compressed versions, and accomplished the state-of-the-art performance in the
JPEG compressed domain. The code will be publicly released at GitHub after
acceptance of paper.
Related papers
- JPEG-LM: LLMs as Image Generators with Canonical Codec Representations [51.097213824684665]
Discretization represents continuous data like images and videos as discrete tokens.
Common methods of discretizing images and videos include modeling raw pixel values.
We show that using canonical representations can help lower the barriers between language generation and visual generation.
arXiv Detail & Related papers (2024-08-15T23:57:02Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - CompTLL-UNet: Compressed Domain Text-Line Localization in Challenging
Handwritten Documents using Deep Feature Learning from JPEG Coefficients [0.9405458160620535]
We propose an idea that employs deep feature learning directly from the JPEG compressed coefficients without full decompression to accomplish text-line localization in the JPEG compressed domain.
A modified U-Net architecture known as Compressed Text-Line localization Network (CompTLL-UNet) is designed to accomplish it.
The model is trained and tested with JPEG compressed version of benchmark datasets including ICDAR 2017 (cBAD) and ICDAR 2019 (cBAD)
arXiv Detail & Related papers (2023-08-11T14:02:52Z) - Text-based Person Search without Parallel Image-Text Data [52.63433741872629]
Text-based person search (TBPS) aims to retrieve the images of the target person from a large image gallery based on a given natural language description.
Existing methods are dominated by training models with parallel image-text pairs, which are very costly to collect.
In this paper, we make the first attempt to explore TBPS without parallel image-text data.
arXiv Detail & Related papers (2023-05-22T12:13:08Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres.
The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z) - OCR for TIFF Compressed Document Images Directly in Compressed Domain
Using Text segmentation and Hidden Markov Model [0.0]
We propose a novel idea of developing an OCR for CCITT (The International Telegraph and Telephone Consultative Committee) compressed machine printed TIFF document images directly in the compressed domain.
After segmenting text regions into lines and words, HMM is applied for recognition using three coding modes of CCITT- horizontal, vertical and the pass mode.
arXiv Detail & Related papers (2022-09-13T06:34:26Z) - Text to Image Synthesis using Stacked Conditional Variational
Autoencoders and Conditional Generative Adversarial Networks [0.0]
Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor.
This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor.
The proposed architecture benefits from a conditioning augmentation and a residual block on the Conditional GAN network to achieve the results.
arXiv Detail & Related papers (2022-07-06T13:43:56Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - Deep Learning Based Image Retrieval in the JPEG Compressed Domain [0.0]
We propose a unified model for image retrieval which takes DCT coefficients as input and efficiently extracts global and local features directly in the JPEG compressed domain for accurate image retrieval.
Our proposed model performed similarly to the current DELG model which takes RGB features as an input with reference to mean average precision.
arXiv Detail & Related papers (2021-07-08T07:30:03Z) - Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency.
Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images.
Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.