P-Hologen: An End-to-End Generative Framework for Phase-Only Holograms
- URL: http://arxiv.org/abs/2404.01330v2
- Date: Fri, 02 May 2025 04:51:15 GMT
- Title: P-Hologen: An End-to-End Generative Framework for Phase-Only Holograms
- Authors: JooHyun Park, YuJin Jeon, HuiYong Kim, SeungHwan Baek, HyeongYeop Kang,
- Abstract summary: We introduce P-Hologen, the first end-to-end generative framework designed for phase-only holograms (POHs)<n>P-Hologen achieves superior quality and computational efficiency compared to the existing methods.<n>Our model generates high-quality unseen, diverse holographic content without requiring pre-existing images.
- Score: 10.667658956529504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Holography stands at the forefront of visual technology, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Although generative models have been extensively explored in the image domain, their application to holograms remains relatively underexplored due to the inherent complexity of phase learning. Exploiting generative models for holograms offers exciting opportunities for advancing innovation and creativity, such as semantic-aware hologram generation and editing. Currently, the most viable approach for utilizing generative models in the hologram domain involves integrating an image-based generative model with an image-to-hologram conversion model, which comes at the cost of increased computational complexity and inefficiency. To tackle this problem, we introduce P-Hologen, the first end-to-end generative framework designed for phase-only holograms (POHs). P-Hologen employs vector quantized variational autoencoders to capture the complex distributions of POHs. It also integrates the angular spectrum method into the training process, constructing latent spaces for complex phase data using strategies from the image processing domain. Extensive experiments demonstrate that P-Hologen achieves superior quality and computational efficiency compared to the existing methods. Furthermore, our model generates high-quality unseen, diverse holographic content from its learned latent space without requiring pre-existing images. Our work paves the way for new applications and methodologies in holographic content creation, opening a new era in the exploration of generative holographic content. The code for our paper is publicly available on https://github.com/james0223/P-Hologen.
Related papers
- EndoGen: Conditional Autoregressive Endoscopic Video Generation [51.97720772069513]
We propose the first conditional endoscopic video generation framework, namely EndoGen.<n>Specifically, we build an autoregressive model with a tailored Spatiotemporal Grid-Frame Patterning strategy.<n>We demonstrate the effectiveness of our framework in generating high-quality, conditionally guided endoscopic content.
arXiv Detail & Related papers (2025-07-23T10:32:20Z) - A Watermark for Auto-Regressive Image Generation Models [50.599325258178254]
We propose C-reweight, a distortion-free watermarking method explicitly designed for image generation models.<n>C-reweight mitigates retokenization mismatch while preserving image fidelity.
arXiv Detail & Related papers (2025-06-13T00:15:54Z) - Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z) - Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.
Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z) - T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system.
It streamlines video generation by using text prompts to create reference videos.
These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z) - DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - OAH-Net: A Deep Neural Network for Hologram Reconstruction of Off-axis Digital Holographic Microscope [5.835347176172883]
We propose a novel reconstruction approach that integrates deep learning with the physical principles of off-axis holography.
Our off-axis hologram network (OAH-Net) retrieves phase and amplitude images with errors that fall within the measurement error range attributable to hardware.
This capability further expands off-axis holography's applications in both biological and medical studies.
arXiv Detail & Related papers (2024-10-17T14:25:18Z) - Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks.
Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation.
We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z) - Quantum Generative Learning for High-Resolution Medical Image Generation [1.189046876525661]
Existing quantum generative adversarial networks (QGANs) fail to generate high-quality images due to their patch-based, pixel-wise learning approaches.<n>We propose a quantum image generative learning (QIGL) approach for high-quality medical image generation.
arXiv Detail & Related papers (2024-06-19T04:04:32Z) - Latent Style-based Quantum GAN for high-quality Image Generation [28.3231031892146]
We introduce the Latent Style-based Quantum GAN (LaSt-QGAN), which employs a hybrid classical-quantum approach in training Generative Adversarial Networks (GANs)
Our LaSt-QGAN can be successfully trained on realistic computer vision datasets beyond the standard MNIST, namely Fashion MNIST (fashion products) and SAT4 (Earth Observation images) with 10 qubits.
arXiv Detail & Related papers (2024-06-04T18:00:00Z) - Re-Thinking Inverse Graphics With Large Language Models [51.333105116400205]
Inverse graphics -- inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics.
We propose the Inverse-Graphics Large Language Model (IG-LLM), an inversegraphics framework centered around an LLM.
We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training.
arXiv Detail & Related papers (2024-04-23T16:59:02Z) - Configurable Learned Holography [33.45219677645646]
We introduce a learned model that interactively computes 3D holograms from RGB-only 2D images for a variety of holographic displays.
We enable our hologram computations to rely on identifying the correlation between depth estimation and 3D hologram synthesis tasks.
arXiv Detail & Related papers (2024-03-24T13:57:30Z) - Stochastic Light Field Holography [35.73147050231529]
The Visual Turing Test is the ultimate goal to evaluate the realism of holographic displays.
Previous studies have focused on addressing challenges such as limited 'etendue and image quality over a large focal volume.
We tackle this problem with a novel hologram generation algorithm motivated by matching the projection operators of incoherent Light Field.
arXiv Detail & Related papers (2023-07-12T16:20:08Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - HORIZON: High-Resolution Semantically Controlled Panorama Synthesis [105.55531244750019]
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds.
Recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, but a direct application of these methods to panorama synthesis yields distorted content.
We unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling.
arXiv Detail & Related papers (2022-10-10T09:43:26Z) - Time-multiplexed Neural Holography: A flexible framework for holographic
near-eye displays with fast heavily-quantized spatial light modulators [44.73608798155336]
Holographic near-eye displays offer unprecedented capabilities for virtual and augmented reality systems.
We report advances in camera-calibrated wave propagation models for these types of holographic near-eye displays.
Our framework is flexible in supporting runtime supervision with different types of content, including 2D and 2.5D RGBD images, 3D focal stacks, and 4D light fields.
arXiv Detail & Related papers (2022-05-05T00:03:50Z) - Image quality enhancement of embedded holograms in holographic
information hiding using deep neural networks [0.0]
The brightness of an embedded hologram is set to a fraction of that of the host hologram, resulting in a barely damaged reconstructed image of the host hologram.
It is difficult to perceive because the embedded hologram's reconstructed image is darker than the reconstructed host image.
In this study, we use deep neural networks to restore the darkened image.
arXiv Detail & Related papers (2021-12-20T01:21:28Z) - Neural Étendue Expander for Ultra-Wide-Angle High-Fidelity Holographic Display [51.399291206537384]
Modern holographic displays possess low 'etendue, which is the product of the display area and the maximum solid angle of diffracted light.
We present neural 'etendue expanders, which are learned from a natural image dataset.
With neural 'etendue expanders, we experimentally achieve 64$times$ 'etendue expansion of natural images in full color, expanding the FOV by an order of magnitude horizontally and vertically.
arXiv Detail & Related papers (2021-09-16T17:21:52Z) - Learned holographic light transport [2.642698101441705]
Holography algorithms often fall short in matching simulations with results from a physical holographic display.
Our work addresses this mismatch by learning the holographic light transport in holographic displays.
Our method can dramatically improve simulation accuracy and image quality in holographic displays.
arXiv Detail & Related papers (2021-08-01T12:05:33Z) - Structure and Design of HoloGen [0.0]
CGH can fully represent a light field including depth of focus, accommodation and vergence.
HoloGen is an MIT licensed application that may be used to generate holograms using a wide array of algorithms without expert guidance.
arXiv Detail & Related papers (2020-06-18T13:29:46Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.