Syntax-Guided Diffusion Language Models with User-Integrated Personalization
- URL: http://arxiv.org/abs/2510.01028v1
- Date: Wed, 01 Oct 2025 15:33:12 GMT
- Title: Syntax-Guided Diffusion Language Models with User-Integrated Personalization
- Authors: Ruqian Zhang, Yijiao Zhang, Juan Shen, Zhongyi Zhu, Annie Qu,
- Abstract summary: Large language models have made revolutionary progress in generating human-like text, yet their outputs often tend to be generic.<n>Recent advances in diffusion models have opened new opportunities for improving language generation beyond the limitations of autoregressive paradigms.<n>We propose a syntax-guided diffusion language model that integrates structural supervision and personalized conditioning to enhance text quality, diversity, and controllability.
- Score: 1.202131801903952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have made revolutionary progress in generating human-like text, yet their outputs often tend to be generic, exhibiting insufficient structural diversity, which limits personalized expression. Recent advances in diffusion models have opened new opportunities for improving language generation beyond the limitations of autoregressive paradigms. In this work, we propose a syntax-guided diffusion language model that integrates structural supervision and personalized conditioning to enhance text quality, diversity, and controllability. We introduce a cascaded framework that generates syntactic guidance before conditional text generation, and further generalize it to a novel noncascaded architecture for better alignment between structure and content. By incorporating syntactic information in the generating process, the proposed model better captures the lexical and structural characteristics of stylistic sentence construction. To enable fine-grained personalization, we develop a shared representation mechanism that facilitates information integration across users, supporting both faithful stylistic generation and generalizable zero-shot inference. Extensive experiments on multiple tasks demonstrate the superiority of our approach in fluency, diversity, and stylistic fidelity. Further qualitative analyses highlight its interpretability and flexibility in learning personalized patterns.
Related papers
- Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z) - Align to Structure: Aligning Large Language Models with Structural Information [26.960069076925386]
We introduce Structural Alignment, a novel method that aligns large language models with human-like discourse structures to enhance long-form text generation.<n>We employ a dense reward scheme within a Proximal Policy Optimization framework, assigning fine-grained, token-level rewards based on the discourse distinctiveness relative to human writing.
arXiv Detail & Related papers (2025-04-04T17:40:04Z) - Hierarchical Lexical Manifold Projection in Large Language Models: A Novel Mechanism for Multi-Scale Semantic Representation [0.0]
The integration of structured hierarchical embeddings into transformer-based architectures introduces a refined approach to lexical representation.<n>A projection mechanism that maps tokens onto a structured manifold provides improved lexical alignment.<n>The refined hierarchical organization of embeddings provides greater interpretability in lexical modeling.
arXiv Detail & Related papers (2025-02-08T00:49:32Z) - Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models [52.23899502520261]
We introduce a novel framework named, ARTIST, which incorporates a dedicated textual diffusion model to focus on the learning of text structures specifically.<n>We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.<n>This disentangled architecture design and training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - Annotating FrameNet via Structure-Conditioned Language Generation [15.877232416259805]
We propose a framework to produce novel frame-semantically annotated sentences following an overgenerate-and-filter approach.
Our results show that conditioning on rich, explicit semantic information tends to produce generations with high human acceptance.
arXiv Detail & Related papers (2024-06-07T11:01:15Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level.
We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z) - PatternGPT :A Pattern-Driven Framework for Large Language Model Text
Generation [1.7259824817932292]
This paper proposes PatternGPT, a pattern-driven text generation framework for Large Language Models.
The framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns.
external knowledge such as judgment criteria and optimization algorithms are used to search for high-quality patterns.
arXiv Detail & Related papers (2023-07-02T04:32:41Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - Incorporating Stylistic Lexical Preferences in Generative Language
Models [10.62343151429147]
We present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models.
Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author's lexical style.
arXiv Detail & Related papers (2020-10-22T09:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.