Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency
- URL: http://arxiv.org/abs/2305.20018v1
- Date: Wed, 31 May 2023 16:47:20 GMT
- Title: Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency
- Authors: Maxwell Crouse, Ramon Astudillo, Tahira Naseem, Subhajit Chaudhury,
Pavan Kapanipathi, Salim Roukos, Alexander Gray
- Abstract summary: Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
- Score: 71.42261918225773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a
scalable, semi-supervised method for training a neural semantic parser.
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic
parser being trained is used to generate annotations for unlabeled text that
are then used as new supervision. To increase the quality of annotations, our
method utilizes a count-based prior over valid formal meaning representations
and a cycle-consistency score produced by a neural text generation model as
additional signals. Both the prior and semantic parser are updated in an
alternate fashion from full passes over the training data, which can be seen as
approximating the marginalization of latent structures through stochastic
variational inference. The use of a count-based prior, frozen text generation
model, and offline annotation process yields an approach with negligible
complexity and latency increases as compared to conventional self-learning. As
an added bonus, the annotations produced by LOCCO can be trivially repurposed
to train a neural text generation model. We demonstrate the utility of LOCCO on
the well-known WebNLG benchmark where we obtain an improvement of 2 points
against a self-learning parser under equivalent conditions, an improvement of
1.3 points against the previous state-of-the-art parser, and competitive text
generation performance in terms of BLEU score.
Related papers
- On Eliciting Syntax from Language Models via Hashing [19.872554909401316]
Unsupervised parsing aims to infer syntactic structure from raw text.
In this paper, we explore the possibility of leveraging this capability to deduce parsing trees from raw text.
We show that our method is effective and efficient enough to acquire high-quality parsing trees from pre-trained language models at a low cost.
arXiv Detail & Related papers (2024-10-05T08:06:19Z) - Reconsidering Degeneration of Token Embeddings with Definitions for Encoder-based Pre-trained Language Models [20.107727903240065]
We propose DefinitionEMB to re-construct isotropically distributed and semantics-related token embeddings for encoder-based language models.
Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to re-construct such embeddings.
arXiv Detail & Related papers (2024-08-02T15:00:05Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding [1.07288078404291]
We propose a natural language understanding approach based on Automatic Speech Recognition (ASR)
We improve a noisy-channel model to handle transcription inconsistencies caused by ASR errors.
Experiments on four benchmark datasets show that Contrastive and Consistency Learning (CCL) outperforms existing methods.
arXiv Detail & Related papers (2024-05-23T23:10:23Z) - Bit Cipher -- A Simple yet Powerful Word Representation System that
Integrates Efficiently with Language Models [4.807347156077897]
Bit-cipher is a word representation system that eliminates the need of backpropagation and hyper-efficient dimensionality reduction techniques.
We perform probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess bit-cipher's competitiveness with classic embeddings.
By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima.
arXiv Detail & Related papers (2023-11-18T08:47:35Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition [41.92991390542083]
We present a simple, novel and competitive approach for phoneme-based neural transducer modeling.
A phonetic context size of one is shown to be sufficient for the best performance.
The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora.
arXiv Detail & Related papers (2020-10-30T16:53:29Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.