Related papers: Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets

Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets

URL: http://arxiv.org/abs/2602.10583v1
Date: Wed, 11 Feb 2026 07:17:41 GMT
Title: Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
Authors: Bo Xue, Yunchong Song, Fanghao Shao, Xuekai Zhu, Lin Chen, Luoyi Fu, Xinbing Wang, Zhouhan Lin,
Abstract summary: Flow of SpanS (FOSS) is a principled GFlowNets framework for span generation.<n>FOSS constructs a dynamic span vocabulary by segmenting the retrieved text flexibly.<n>With specialized reward models, FoSS generates diverse, high-quality text.
Score: 54.06320619464273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard autoregressive language models generate text token-by-token from a fixed vocabulary, inducing a tree-structured state space when viewing token sampling as an action, which limits flexibility and expressiveness. Recent work introduces dynamic vocabulary by sampling retrieved text spans but overlooks that the same sentence can be composed of spans of varying lengths, lacking explicit modeling of the directed acyclic graph (DAG) state space. This leads to restricted exploration of compositional paths and is biased toward the chosen path. Generative Flow Networks (GFlowNets) are powerful for efficient exploring and generalizing over state spaces, particularly those with a DAG structure. However, prior GFlowNets-based language models operate at the token level and remain confined to tree-structured spaces, limiting their potential. In this work, we propose Flow of SpanS (FOSS), a principled GFlowNets framework for span generation. FoSS constructs a dynamic span vocabulary by segmenting the retrieved text flexibly, ensuring a DAG-structured state space, which allows GFlowNets to explore diverse compositional paths and improve generalization. With specialized reward models, FoSS generates diverse, high-quality text. Empirically, FoSS improves MAUVE scores by up to 12.5% over Transformer on text generation and achieves 3.5% gains on knowledge-intensive tasks, consistently outperforming state-of-the-art methods. Scaling experiments further demonstrate FoSS benefits from larger models, more data, and richer retrieval corpora, retaining its advantage over strong baselines.

Related papers

Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression [55.51959317490934]
Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding.<n>We argue that graphs inherently contain rich structural and semantic information, and that their effective exploitation can unlock potential gains in LLMs reasoning performance.<n>We propose Homophily-aware Structural and Semantic Compression for LLMs (HS2C), a framework centered on exploiting graph homophily.
arXiv Detail & Related papers (2026-01-13T03:35:18Z)
HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition [4.5311655360445515]
Existing approaches, though partially address these issues, often struggle to generalize without massive synthetic data.<n>We propose HTR-ConvText, a model designed to capture fine-grained, stroke-level local features while preserving global contextual dependencies.<n>We then introduce the ConvText encoder, a hybrid architecture combining global context and local features within a hierarchical structure.
arXiv Detail & Related papers (2025-12-04T17:35:05Z)
GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning [50.40400074353263]
Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs.<n>We introduce textbfGraph textbfIn-context textbfL textbfTransformer (GILT), a framework built on an LLM-free and tuning-free architecture.
arXiv Detail & Related papers (2025-10-06T08:09:15Z)
PICASO: Permutation-Invariant Context Composition with State Space Models [98.91198288025117]
State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states.<n>We propose a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens.<n>We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
arXiv Detail & Related papers (2025-02-24T19:48:00Z)
STAGE: Simplified Text-Attributed Graph Embeddings Using Pre-trained LLMs [1.4624458429745086]
We present a method for enhancing node features in Graph Neural Network (GNN) models that encode Text-Attributed Graphs (TAGs) Our approach leverages Large-Language Models (LLMs) to generate embeddings for textual attributes. We show that utilizing pre-trained LLMs as embedding generators provides robust features for ensemble GNN training.
arXiv Detail & Related papers (2024-07-10T08:50:25Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models [31.443478448031886]
RoSE (Relation-oriented Semantic Edge-decomposition) is a novel framework that decomposes the graph structure by analyzing raw text attributes. Our framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.
arXiv Detail & Related papers (2024-05-28T20:54:47Z)
Pretraining Language Models with Text-Attributed Heterogeneous Graphs [28.579509154284448]
We present a new pretraining framework for Language Models (LMs) that explicitly considers the topological and heterogeneous information in Text-Attributed Heterogeneous Graphs (TAHGs) We propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network. We conduct link prediction and node classification tasks on three datasets from various domains.
arXiv Detail & Related papers (2023-10-19T08:41:21Z)
Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs [5.587264586806575]
We propose a plug-and-play approach to empower text-attributed graphs through node generation using Large Language Models (LLMs)<n>LLMs extract semantic information from labels and generate samples that belong to categories as exemplars.<n>We employ an edge predictor to capture structural information inherent in the raw dataset and integrate the newly generated samples into the original graph.
arXiv Detail & Related papers (2023-10-15T16:04:28Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.