Related papers: On the N-gram Approximation of Pre-trained Language Models

On the N-gram Approximation of Pre-trained Language Models

URL: http://arxiv.org/abs/2306.06892v1
Date: Mon, 12 Jun 2023 06:42:08 GMT
Title: On the N-gram Approximation of Pre-trained Language Models
Authors: Aravind Krishnan, Jesujoba Alabi, Dietrich Klakow
Abstract summary: Large pre-trained language models (PLMs) have shown remarkable performance across various natural language understanding (NLU) tasks. This study investigates the potential usage of PLMs for language modelling in Automatic Speech Recognition (ASR) We compare the application of large-scale text sampling and probability conversion for approximating GPT-2 into an n-gram model.
Score: 17.764803904135903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large pre-trained language models (PLMs) have shown remarkable performance across various natural language understanding (NLU) tasks, particularly in low-resource settings. Nevertheless, their potential in Automatic Speech Recognition (ASR) remains largely unexplored. This study investigates the potential usage of PLMs for language modelling in ASR. We compare the application of large-scale text sampling and probability conversion for approximating GPT-2 into an n-gram model. Furthermore, we introduce a vocabulary-restricted decoding method for random sampling, and evaluate the effects of domain difficulty and data size on the usability of generated text. Our findings across eight domain-specific corpora support the use of sampling-based approximation and show that interpolating with a large sampled corpus improves test perplexity over a baseline trigram by 15%. Our vocabulary-restricted decoding method pushes this improvement further by 5% in domain-specific settings.

Related papers

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences. We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z)
Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts. We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z)
Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore. We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Nearest Neighbor Machine Translation [113.96357168879548]
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT) It predicts tokens with a nearest neighbor classifier over a large datastore of cached examples. It consistently improves performance across many settings.
arXiv Detail & Related papers (2020-10-01T22:24:46Z)
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech [17.602098162338137]
We explore a multimodal semi-supervised learning approach for punctuation prediction. We learn representations from large amounts of unlabelled audio and text data. When trained on 1 hour of speech and text data, the proposed model achieved 9-18% absolute improvement over baseline model.
arXiv Detail & Related papers (2020-08-03T08:13:09Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.