Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences
- URL: http://arxiv.org/abs/2303.08809v2
- Date: Tue, 9 May 2023 10:36:55 GMT
- Title: Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences
- Authors: Yuan Tseng, Cheng-I Lai, Hung-yi Lee
- Abstract summary: We present the first study on unsupervised spoken constituency parsing.
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees.
We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
- Score: 67.37544997614646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Past work on unsupervised parsing is constrained to written form. In this
paper, we present the first study on unsupervised spoken constituency parsing
given unlabeled spoken sentences and unpaired textual data. The goal is to
determine the spoken sentences' hierarchical syntactic structure in the form of
constituency parse trees, such that each node is a span of audio that
corresponds to a constituent. We compare two approaches: (1) cascading an
unsupervised automatic speech recognition (ASR) model and an unsupervised
parser to obtain parse trees on ASR transcripts, and (2) direct training an
unsupervised parser on continuous word-level speech representations. This is
done by first splitting utterances into sequences of word-level segments, and
aggregating self-supervised speech representations within segments to obtain
segment embeddings. We find that separately training a parser on the unpaired
text and directly applying it on ASR transcripts for inference produces better
results for unsupervised parsing. Additionally, our results suggest that
accurate segmentation alone may be sufficient to parse spoken sentences
accurately. Finally, we show the direct approach may learn head-directionality
correctly for both head-initial and head-final languages without any explicit
inductive bias.
Related papers
- Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech [8.550564152063522]
We report on a set of experiments aiming at assessing the performance of two parsing paradigms on speech parsing.
We perform this evaluation on a large treebank of spoken French, featuring realistic spontaneous conversations.
Our findings show that (i) the graph based approach obtains better results across the board (ii) parsing directly from speech outperforms a pipeline approach, despite having 30% fewer parameters.
arXiv Detail & Related papers (2024-06-18T13:46:10Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing [43.758912958903494]
We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speechs.
To compute the metric, we project the ground-truth parse tree to the speech domain by forced alignment, align the projected ground-truth constituents with the predicted ones under certain structured constraints, and calculate the average IOU score across all aligned constituent pairs.
arXiv Detail & Related papers (2024-02-21T00:01:17Z) - REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR [54.64158282822995]
We propose REBORN,Reinforcement-Learned Boundary with Iterative Training for Unsupervised ASR.
ReBORN alternates between training a segmentation model that predicts the boundaries of the segmental structures in speech signals and training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription.
We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech.
arXiv Detail & Related papers (2024-02-06T13:26:19Z) - Audio-Visual Neural Syntax Acquisition [91.14892278795892]
We study phrase structure induction from visually-grounded speech.
We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without ever being exposed to text.
arXiv Detail & Related papers (2023-10-11T16:54:57Z) - Unsupervised Full Constituency Parsing with Neighboring Distribution
Divergence [48.69930912510414]
We propose an unsupervised and training-free labeling procedure by exploiting the property of a recently introduced metric.
For implementation, we develop NDD into Dual POS-NDD and build "molds" to detect constituents and their labels in sentences.
We show that DP-NDD not only labels constituents precisely but also inducts more accurate unlabeled constituency trees than all previous unsupervised methods with simpler rules.
arXiv Detail & Related papers (2021-10-29T17:27:34Z) - RST Parsing from Scratch [14.548146390081778]
We introduce a novel end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.
Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite.
Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees.
arXiv Detail & Related papers (2021-05-23T06:19:38Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z) - Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing [41.345662724584884]
We propose a two-stage semantic parsing framework to reduce nontrivial human labor.
The first stage utilizes an unsupervised paraphrase model to convert an unlabeled natural language utterance into a canonical utterance.
The downstream naive semantic accepts the intermediate output and returns the target logical form.
arXiv Detail & Related papers (2020-05-27T16:47:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.