Related papers: Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

URL: http://arxiv.org/abs/2310.14997v1
Date: Mon, 23 Oct 2023 14:48:51 GMT
Title: Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
Authors: Wei Liu, Songlin Yang, Yoon Kim, Kewei Tu
Abstract summary: This work introduces emphSimplePCFG, a simple PCFG formalism with independent left and right productions. As an unsupervised algorithm, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs.
Score: 77.12660133995362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.

Related papers

Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z)
Token-Efficient RL for LLM Reasoning [0.02488650627593658]
We propose reinforcement learning strategies tailored for reasoning in large language models (LLMs) under strict memory and compute limits.<n>Building on early policy gradient methods with baseline subtraction, we design critic-free methods that operate on a small, informative subset of output tokens.<n>We show that our methods raise accuracy on the SVAMP benchmark from 46% to over 70% and show strong performance on multi-digit multiplication.
arXiv Detail & Related papers (2025-04-29T14:58:43Z)
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score. We propose a principled metric to replace each pruned block using a weight-sharing mechanism. Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z)
Learning Regularized Monotone Graphon Mean-Field Games [155.38727464526923]
We study fundamental problems in regularized Graphon Mean-Field Games (GMFGs) We establish the existence of a Nash Equilibrium (NE) of any $lambda$-regularized GMFG. We propose provably efficient algorithms to learn the NE in weakly monotone GMFGs.
arXiv Detail & Related papers (2023-10-12T07:34:13Z)
Do Transformers Parse while Predicting the Masked Word? [48.65553369481289]
Some doubts have been raised whether pre-trained language models actually are doing parsing. This paper takes a step toward answering these questions in the context of generative modeling with PCFGs.
arXiv Detail & Related papers (2023-03-14T17:49:50Z)
Efficient $\Phi$-Regret Minimization in Extensive-Form Games via Online Mirror Descent [49.93548413166884]
$Phi$-Hedge is a generic algorithm capable of learning a large class of equilibria for Normal-Form Games (NFGs) We show that $Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the emph$Phi$-Hedge algorithms are equivalent to standard Mirror Descent (OMD) algorithms for
arXiv Detail & Related papers (2022-05-30T17:58:06Z)
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs [35.31888995651471]
Recent research found it beneficial to use large state spaces for HMMs and PCFGs. Inference with large state spaces is computationally demanding, especially for PCFGs. We leverage tensor rank decomposition to decrease inference computational complexities.
arXiv Detail & Related papers (2022-05-01T14:58:25Z)
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition [78.67749936030219]
Prune-Adjust- Re-Prune (PARP) discovers and finetunesworks for much better ASR performance. Experiments on low-resource English and multi-lingual ASR show sparseworks exist in pre-trained speech SSL.
arXiv Detail & Related papers (2021-06-10T17:32:25Z)
Neural Bi-Lexicalized PCFG Induction [22.728124473130876]
We propose an approach to parameterize L-PCFGs without making implausible independence assumptions. Our approach directly models bilexical dependencies and meanwhile reduces both learning and representation complexities of L-PCFGs.
arXiv Detail & Related papers (2021-05-31T15:00:03Z)
PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols [22.728124473130876]
We present a new parameterization form of PCFGs based on tensor decomposition. We use neural parameterization for the new form to improve unsupervised parsing performance. We evaluate our model across ten languages and empirically demonstrate the effectiveness of using more symbols.
arXiv Detail & Related papers (2021-04-28T12:25:27Z)
The Return of Lexical Dependencies: Neural Lexicalized PCFGs [103.41187595153652]
We present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when either formalism alone.
arXiv Detail & Related papers (2020-07-29T22:12:49Z)
Efficient Constituency Parsing by Pointing [21.395573911155495]
We propose a novel constituency parsing model that casts the parsing problem into a series of pointing tasks. Our model supports efficient top-down decoding and our learning objective is able to enforce structural consistency without resorting to the expensive CKY inference.
arXiv Detail & Related papers (2020-06-24T08:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.