Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction
- URL: http://arxiv.org/abs/2406.01294v1
- Date: Mon, 3 Jun 2024 13:04:42 GMT
- Title: Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction
- Authors: Rita Pucci, Niki Martinel,
- Abstract summary: We introduce a novel architecture that jointly tackles both issues by drawing inspiration from the discrete features quantization approach of Vector Quantized Variational Autoencoder (myVQVAE)
Our model combines an encoding network, that compresses the input into a latent representation, with two independent decoding networks, that enhance/reconstruct images using only the latent representation.
With the usage of capsule layers, we also overcome the differentiabilty issues of myVQVAE making our solution trainable in an end-to-end fashion without the need for particular optimization tricks.
- Score: 8.16306466526838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Underwater image analysis is crucial for marine monitoring. However, it presents two major challenges (i) the visual quality of the images is often degraded due to wavelength-dependent light attenuation, scattering, and water types; (ii) capturing and storing high-resolution images is limited by hardware, which hinders long-term environmental analyses. Recently, deep neural networks have been introduced for underwater enhancement yet neglecting the challenge posed by the limitations of autonomous underwater image acquisition systems. We introduce a novel architecture that jointly tackles both issues by drawing inspiration from the discrete features quantization approach of Vector Quantized Variational Autoencoder (\myVQVAE). Our model combines an encoding network, that compresses the input into a latent representation, with two independent decoding networks, that enhance/reconstruct images using only the latent representation. One decoder focuses on the spatial information while the other captures information about the entities in the image by leveraging the concept of capsules. With the usage of capsule layers, we also overcome the differentiabilty issues of \myVQVAE making our solution trainable in an end-to-end fashion without the need for particular optimization tricks. Capsules perform feature quantization in a fully differentiable manner. We conducted thorough quantitative and qualitative evaluations on 6 benchmark datasets to assess the effectiveness of our contributions. Results demonstrate that we perform better than existing methods (eg, about $+1.4dB$ gain on the challenging LSUI Test-L400 dataset), while significantly reducing the amount of space needed for data storage (ie, $3\times$ more efficient).
Related papers
- DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning [42.22785629783251]
Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization.<n>Recent advances have alleviated the performance degradation of autoencoders under high compression ratios, but training instability caused by GAN remains an open challenge.<n>We propose DGAE, which employs a diffusion model to guide the decoder in recovering informative signals that are not fully decoded from the latent representation.
arXiv Detail & Related papers (2025-06-11T12:01:03Z) - Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression [61.500904231491596]
Most approaches for image and video compression perform transform coding in the pixel space to reduce redundancy.<n>We propose textbfGenerative textbfLatent textbfCoding (textbfGLC) models for image and video compression, GLC-image and GLC-Video.
arXiv Detail & Related papers (2025-05-22T03:31:33Z) - H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models [76.1519545010611]
Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation.
In this work, we examine the architecture design choices and optimize the computation distribution to obtain efficient and high-compression video AEs.
Our AE achieves an ultra-high compression ratio and real-time decoding speed on mobile while outperforming prior art in terms of reconstruction metrics.
arXiv Detail & Related papers (2025-04-14T17:59:06Z) - GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation [62.77721499671665]
We introduce GigaTok, the first approach to improve image reconstruction, generation, and representation learning when scaling visual tokenizers.
We identify the growing complexity of latent space as the key factor behind the reconstruction vs. generation dilemma.
By scaling to $bf3 space billion$ parameters, GigaTok achieves state-of-the-art performance in reconstruction, downstream AR generation, and downstream AR representation quality.
arXiv Detail & Related papers (2025-04-11T17:59:58Z) - Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing [50.92820394852817]
We propose a textitPrior-textitguided textitHarmonization Network (PGH$2$Net) for image dehazing.
PGH$2$Net is built upon the UNet-like architecture with an efficient encoder and decoder, consisting of two module types.
arXiv Detail & Related papers (2025-03-03T03:36:30Z) - Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression [7.643300240138419]
We introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities.
Our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information.
Our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely lows.
arXiv Detail & Related papers (2024-12-17T15:01:35Z) - Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation [52.82508784748278]
This paper proposes a Control Generative Image Compression framework, termed Control-GIC.<n>Control-GIC is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression.<n>Our experiments show that Control-GIC allows highly flexible and controllable adaption where the results demonstrate its superior performance over recent state-of-the-art methods.
arXiv Detail & Related papers (2024-06-02T14:22:09Z) - HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression [51.04820313355164]
HyrbidFlow combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme lows.
Experimental results demonstrate superior performance across several datasets under extremely lows.
arXiv Detail & Related papers (2024-04-20T13:19:08Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Divided Attention: Unsupervised Multi-Object Discovery with Contextually
Separated Slots [78.23772771485635]
We introduce a method to segment the visual field into independently moving regions, trained with no ground truth or supervision.
It consists of an adversarial conditional encoder-decoder architecture based on Slot Attention.
arXiv Detail & Related papers (2023-04-04T00:26:13Z) - High Fidelity Image Synthesis With Deep VAEs In Latent Space [0.0]
We present fast, realistic image generation on high-resolution, multimodal datasets using hierarchical variational autoencoders (VAEs)
In this two-stage setup, the autoencoder compresses the image into its semantic features, which are then modeled with a deep VAE.
We demonstrate the effectiveness of our two-stage approach, achieving a FID of 9.34 on the ImageNet-256 dataset which is comparable to BigGAN.
arXiv Detail & Related papers (2023-03-23T23:45:19Z) - UW-CVGAN: UnderWater Image Enhancement with Capsules Vectors
Quantization [25.23797117677732]
We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization paradigm from VQGAN for this task.
The proposed UWCVGAN combines an encoding network, which compresses the image into its latent representation, with a decoding network, able to reconstruct the enhancement of the image from the only latent representation.
arXiv Detail & Related papers (2023-02-02T15:00:03Z) - Device Interoperability for Learned Image Compression with Weights and
Activations Quantization [1.373801677008598]
We present a method to solve the device interoperability problem of a state-of-the-art image compression network.
We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly.
arXiv Detail & Related papers (2022-12-02T17:45:29Z) - Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
We propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints.
Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources.
arXiv Detail & Related papers (2021-12-17T14:31:40Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.