Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku
- URL: http://arxiv.org/abs/2502.11586v1
- Date: Mon, 17 Feb 2025 09:18:06 GMT
- Title: Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku
- Authors: Chunan Yu, Yidong Han, Chaotao Ding, Ying Zang, Lanyun Zhu, Xinhao Chen, Zejian Li, Renjun Xu, Tianrun Chen,
- Abstract summary: This research introduces HaikuVerse, a novel framework for transforming poetic abstraction into spatial representation.
We present a literary-guided approach that synergizes traditional poetry analysis with advanced generative technologies.
Our framework centers on two key innovations: (1) Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP), which captures both explicit imagery and implicit emotional resonance through structured semantic decomposition, and (2) Progressive Dimensional Synthesis (PDS), a multi-stage pipeline that systematically transforms poetic elements into coherent 3D scenes.
- Score: 7.9900858134493
- License:
- Abstract: In the era of the metaverse, where immersive technologies redefine human experiences, translating abstract literary concepts into navigable 3D environments presents a fundamental challenge in preserving semantic and emotional fidelity. This research introduces HaikuVerse, a novel framework for transforming poetic abstraction into spatial representation, with Japanese Haiku serving as an ideal test case due to its sophisticated encapsulation of profound emotions and imagery within minimal text. While existing text-to-3D methods struggle with nuanced interpretations, we present a literary-guided approach that synergizes traditional poetry analysis with advanced generative technologies. Our framework centers on two key innovations: (1) Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP), which captures both explicit imagery and implicit emotional resonance through structured semantic decomposition, and (2) Progressive Dimensional Synthesis (PDS), a multi-stage pipeline that systematically transforms poetic elements into coherent 3D scenes through sequential diffusion processes, geometric optimization, and real-time enhancement. Extensive experiments demonstrate that HaikuVerse significantly outperforms conventional text-to-3D approaches in both literary fidelity and visual quality, establishing a new paradigm for preserving cultural heritage in immersive digital spaces. Project website at: https://syllables-to-scenes.github.io/
Related papers
- Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models [18.293592213622183]
We propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems.
Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content.
To expand the diversity of the poetry dataset, we introduce MiniPo, a novel multimodal dataset comprising 1001 children's poems and images.
arXiv Detail & Related papers (2025-01-10T10:26:54Z) - Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks [2.250406890348191]
We propose a semi-supervised approach using cycle-consistent adversarial networks to leverage the limited paired data.
We introduce novel evaluation metrics to assess the quality, diversity, and consistency of the generated poems and paintings.
The proposed model outperforms previous methods, showing promise in capturing the symbolic essence of artistic expression.
arXiv Detail & Related papers (2024-10-25T04:57:44Z) - 3D Vision-Language Gaussian Splatting [29.047044145499036]
Multi-modal 3D scene understanding has vital applications in robotics, autonomous driving, and virtual/augmented reality.
We propose a solution that achieves adequately handles the distinct visual and semantic modalities.
We also employ a camera-view blending technique to improve semantic consistency between existing views.
arXiv Detail & Related papers (2024-10-10T03:28:29Z) - GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts [48.28000728061778]
We propose a method that integrates an open vocabulary scene encoder into the architecture, establishing a robust connection between text and scene.
Our methodology achieves up to a 30% reduction in the goal object distance metric compared to the prior state-of-the-art baseline model.
arXiv Detail & Related papers (2024-04-08T18:24:12Z) - A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
Objects in 3D Scenes [80.20670062509723]
3D dense captioning is an emerging vision-language bridging task that aims to generate detailed descriptions for 3D scenes.
It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning.
Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field.
arXiv Detail & Related papers (2024-03-12T10:04:08Z) - DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding [15.341857735842954]
Existing text-to-3D methods tend to have mode collapses, and hence poor diversity in their results.
We propose a new method that considers the joint generation of different 3D models from the same text prompt.
We show that our method leads to improved diversity in text-to-3D synthesis qualitatively and quantitatively.
arXiv Detail & Related papers (2023-12-02T08:21:20Z) - RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent
Geometry and Texture [80.0643976406225]
We propose "RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style.
Our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously.
To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments.
arXiv Detail & Related papers (2023-05-18T22:57:57Z) - AIwriting: Relations Between Image Generation and Digital Writing [0.0]
During 2022, AI text generation systems such as GPT-3 and AI text-to-image generation systems such as DALL-E 2 made exponential leaps forward.
In this panel a group of electronic literature authors and theorists consider new oppor-tunities for human creativity presented by these systems.
arXiv Detail & Related papers (2023-05-18T09:23:05Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for
Text-to-Image Generation [71.87682778102236]
We propose a novel Dynamical Semantic Evolution GAN (DSE-GAN) to re-compose each stage's text features under a novel single adversarial multi-stage architecture.
DSE-GAN achieves 7.48% and 37.8% relative FID improvement on two widely used benchmarks.
arXiv Detail & Related papers (2022-09-03T06:13:26Z) - Semantic View Synthesis [56.47999473206778]
We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.
First, we focus on synthesizing the color and depth of the visible surface of the 3D scene.
We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
arXiv Detail & Related papers (2020-08-24T17:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.