CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement
- URL: http://arxiv.org/abs/2406.01294v2
- Date: Fri, 22 Nov 2024 10:25:03 GMT
- Title: CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement
- Authors: Rita Pucci, Niki Martinel,
- Abstract summary: Unmanned underwater image analysis for marine monitoring faces two key challenges: degraded image quality and hardware storage constraints.
We introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images.
CE-VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets.
- Score: 8.16306466526838
- License:
- Abstract: Unmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE-VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3x higher compression efficiency than existing approaches. Code available at \url{https://github.com/iN1k1/ce-vae-underwater-image-enhancement}.
Related papers
- HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression [51.04820313355164]
HyrbidFlow combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme lows.
Experimental results demonstrate superior performance across several datasets under extremely lows.
arXiv Detail & Related papers (2024-04-20T13:19:08Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Divided Attention: Unsupervised Multi-Object Discovery with Contextually
Separated Slots [78.23772771485635]
We introduce a method to segment the visual field into independently moving regions, trained with no ground truth or supervision.
It consists of an adversarial conditional encoder-decoder architecture based on Slot Attention.
arXiv Detail & Related papers (2023-04-04T00:26:13Z) - High Fidelity Image Synthesis With Deep VAEs In Latent Space [0.0]
We present fast, realistic image generation on high-resolution, multimodal datasets using hierarchical variational autoencoders (VAEs)
In this two-stage setup, the autoencoder compresses the image into its semantic features, which are then modeled with a deep VAE.
We demonstrate the effectiveness of our two-stage approach, achieving a FID of 9.34 on the ImageNet-256 dataset which is comparable to BigGAN.
arXiv Detail & Related papers (2023-03-23T23:45:19Z) - UW-CVGAN: UnderWater Image Enhancement with Capsules Vectors
Quantization [25.23797117677732]
We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization paradigm from VQGAN for this task.
The proposed UWCVGAN combines an encoding network, which compresses the image into its latent representation, with a decoding network, able to reconstruct the enhancement of the image from the only latent representation.
arXiv Detail & Related papers (2023-02-02T15:00:03Z) - Device Interoperability for Learned Image Compression with Weights and
Activations Quantization [1.373801677008598]
We present a method to solve the device interoperability problem of a state-of-the-art image compression network.
We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly.
arXiv Detail & Related papers (2022-12-02T17:45:29Z) - Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
We propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints.
Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources.
arXiv Detail & Related papers (2021-12-17T14:31:40Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.