XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs
- URL: http://arxiv.org/abs/2601.04426v1
- Date: Wed, 07 Jan 2026 22:18:51 GMT
- Title: XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs
- Authors: Linzhang Li, Yixin Dong, Guanjie Wang, Ziyi Xu, Alexander Jiang, Tianqi Chen,
- Abstract summary: XGrammar 2 is a highly optimized structured generation engine for agentic LLMs.<n>XGrammar 2 accelerates the mask generation for dynamic structured generation tasks through a new dynamic dispatching semantics: TagDispatch.<n>XGrammar 2 can achieve more than 6x speedup over the existing structured generation engines.
- Score: 43.019637484576755
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Modern LLM agents are required to handle increasingly complex structured generation tasks, such as tool calling and conditional structured generation. These tasks are significantly more dynamic than predefined structures, posing new challenges to the current structured generation engines. In this paper, we propose XGrammar 2, a highly optimized structured generation engine for agentic LLMs. XGrammar 2 accelerates the mask generation for these dynamic structured generation tasks through a new dynamic dispatching semantics: TagDispatch. We further introduce a just-in-time (JIT) compilation method to reduce compilation time and a cross-grammar caching mechanism to leverage the common sub-structures across different grammars. Additionally, we extend the previous PDA-based mask generation algorithm to the Earley-parser-based one and design a repetition compression algorithm to handle repetition structures in grammars. Evaluation results show that XGrammar 2 can achieve more than 6x speedup over the existing structured generation engines. Integrated with an LLM inference engine, XGrammar 2 can handle dynamic structured generation tasks with near-zero overhead.
Related papers
- Evolutionary Generation of Multi-Agent Systems [49.47969796873096]
Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks.<n>EvoMAS formulates MAS generation as structured configuration generation.<n>EvoMAS consistently improves task performance over both human-designed MAS and prior automatic MAS generation methods.
arXiv Detail & Related papers (2026-02-06T09:01:35Z) - ThinkGen: Generalized Thinking for Visual Generation [97.19923474851987]
ThinkGen is a think-driven visual generation framework that explicitly leverages Chain-of-Thought (CoT) reasoning in various generation scenarios.<n>We propose a separable GRPO-based training paradigm, alternating reinforcement learning between the MLLM and DiT modules.<n>Experiments demonstrate that ThinkGen achieves robust, state-of-the-art performance across multiple generation benchmarks.
arXiv Detail & Related papers (2025-12-29T16:08:50Z) - AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents [27.864519204078004]
Large language models (LLMs) have shown impressive performance in general programming tasks.<n>We introduce AutoMLGen, an LLM-based coding agent that integrates a domain knowledge base for high-quality prior guidance.<n>We show that AutoMLGen achieves state-of-the-art performance in numerous dimensions, such as the average medal rate and the valid submission rate.
arXiv Detail & Related papers (2025-10-09T17:45:05Z) - ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction [84.90394416593624]
Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step interactions.<n>Existing simulation-based data generation methods rely heavily on costly autoregressive interactions between multiple agents.<n>We propose a novel Non-Autoregressive Iterative Generation framework, called ToolACE-MT, for constructing high-quality multi-turn agentic dialogues.
arXiv Detail & Related papers (2025-08-18T07:38:23Z) - DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System [0.276240219662896]
DynaSwarm is a dynamic framework that enhances multi-agent systems.<n>It uses an actor-critic reinforcement learning mechanism to optimize graph structures.<n>It also has a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample.
arXiv Detail & Related papers (2025-07-31T05:52:30Z) - WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding [58.1177179119881]
We introduce wgrammar, a lightweight decoding engine that integrates domain-aware simplification, constraint decomposition, and mask caching.<n> wgrammar achieves up to 250x speedup over existing systems.
arXiv Detail & Related papers (2025-07-22T17:13:47Z) - HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design [24.46771930751068]
HiVeGen is a hierarchical Verilog generation framework that decomposes generation tasks into hierarchical submodules.<n> automatic Design Space Exploration (DSE) into hierarchy-aware prompt generation, introducing weight-based retrieval to enhance code reuse.<n>Real-time human-computer interaction to lower error-correction cost, significantly improving the quality of generated designs.
arXiv Detail & Related papers (2024-12-06T19:37:53Z) - XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models [3.9417976759908573]
Context-free grammar is a flexible approach to enable structured generation via constrained decoding.<n>XGrammar is a flexible and efficient structure generation engine for large language models.<n>XGrammar can achieve up to 100x speedup over existing solutions.
arXiv Detail & Related papers (2024-11-22T18:01:37Z) - A Simple but Effective Approach to Improve Structured Language Model
Output for Information Extraction [11.165093163378152]
Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions.
This paper introduces an efficient method, G&O, to enhance their structured text generation capabilities.
arXiv Detail & Related papers (2024-02-20T20:42:02Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.