Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction
- URL: http://arxiv.org/abs/2509.20734v1
- Date: Thu, 25 Sep 2025 04:31:14 GMT
- Title: Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction
- Authors: Jinwook Park, Kangil Kim,
- Abstract summary: Unsupervised neural grammar induction aims to learn interpretable hierarchical structures from language data.<n>Existing models face a bottleneck, often resulting in unnecessarily large yet underperforming grammars.<n>We analyze when and how the collapse emerges across key components of neural parameterization and introduce a targeted solution.
- Score: 13.836565669337057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised neural grammar induction aims to learn interpretable hierarchical structures from language data. However, existing models face an expressiveness bottleneck, often resulting in unnecessarily large yet underperforming grammars. We identify a core issue, $\textit{probability distribution collapse}$, as the underlying cause of this limitation. We analyze when and how the collapse emerges across key components of neural parameterization and introduce a targeted solution, $\textit{collapse-relaxing neural parameterization}$, to mitigate it. Our approach substantially improves parsing performance while enabling the use of significantly more compact grammars across a wide range of languages, as demonstrated through extensive empirical analysis.
Related papers
- Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z) - Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction [2.3020018305241337]
Our research promotes learning more compact, accurate, and consistent explicit grammars, facilitating better interpretability.
In unsupervised parsing benchmark tests, our method significantly improves performance while reducing bias toward overly simplistic parses.
arXiv Detail & Related papers (2024-07-23T04:57:03Z) - Structural generalization in COGS: Supertagging is (almost) all you need [12.991247861348048]
Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required.
We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
arXiv Detail & Related papers (2023-10-21T21:51:25Z) - On the Expressiveness and Generalization of Hypergraph Neural Networks [77.65788763444877]
This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs)
Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes.
arXiv Detail & Related papers (2023-03-09T18:42:18Z) - Finding Alignments Between Interpretable Causal Variables and
Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence.
Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one.
We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Less is More: A Lightweight and Robust Neural Architecture for Discourse
Parsing [27.28989421841165]
We propose an alternative lightweight neural architecture that removes multiple complex feature extractors and only utilizes learnable self-attention modules.
Experiments on three common discourse parsing tasks show that powered by recent pretrained language models, the lightweight architecture obtains much better generalizability and robustness.
arXiv Detail & Related papers (2022-10-18T02:07:09Z) - Improving Topic Segmentation by Injecting Discourse Dependencies [29.353285741379334]
We present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures.
Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter can substantially improve its performances.
arXiv Detail & Related papers (2022-09-18T18:22:25Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Demystifying Neural Language Models' Insensitivity to Word-Order [7.72780997900827]
We investigate the insensitivity of natural language models to word-order by quantifying perturbations.
We find that neural language models require local ordering more so than the global ordering of tokens.
arXiv Detail & Related papers (2021-07-29T13:34:20Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Neural Data-to-Text Generation via Jointly Learning the Segmentation and
Correspondence [48.765579605145454]
We propose to explicitly segment target text into fragment units and align them with their data correspondences.
The resulting architecture maintains the same expressive power as neural attention models.
On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.
arXiv Detail & Related papers (2020-05-03T14:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.