Spiral Language Modeling
- URL: http://arxiv.org/abs/2112.10543v1
- Date: Mon, 20 Dec 2021 14:08:38 GMT
- Title: Spiral Language Modeling
- Authors: Yong Cao, Yukun Feng, Shaohui Kuang, Gu Xu
- Abstract summary: Spiral Language Modeling (SLM) is a general approach that enables one to construct natural language sentences beyond the L2R and R2L order.
SLM allows one to form natural language text by starting from an arbitrary token inside the result text.
Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase.
- Score: 5.816641790933646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In almost all text generation applications, word sequences are constructed in
a left-to-right (L2R) or right-to-left (R2L) manner, as natural language
sentences are written either L2R or R2L. However, we find that the natural
language written order is not essential for text generation. In this paper, we
propose Spiral Language Modeling (SLM), a general approach that enables one to
construct natural language sentences beyond the L2R and R2L order. SLM allows
one to form natural language text by starting from an arbitrary token inside
the result text and expanding the rest tokens around the selected ones. It
makes the decoding order a new optimization objective besides the language
model perplexity, which further improves the diversity and quality of the
generated text. Furthermore, SLM makes it possible to manipulate the text
construction process by selecting a proper starting token. SLM also introduces
generation orderings as additional regularization to improve model robustness
in low-resource scenarios. Experiments on 8 widely studied Neural Machine
Translation (NMT) tasks show that SLM is constantly effective with up to 4.7
BLEU increase comparing to the conventional L2R decoding approach.
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts [50.40191599304911]
We investigate whether transliteration is also effective in improving LLMs' performance for low-resource languages written in non-Latin scripts.
We propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both.
Our findings show that the effectiveness of transliteration varies by task type and model size.
arXiv Detail & Related papers (2024-07-02T14:51:20Z) - ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts.
In this paper, we identify the common characteristics shared by these models.
We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z) - Select and Reorder: A Novel Approach for Neural Sign Language Production [35.35777909051466]
Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation.
This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR)
We achieve state-of-the-art BLEU and Rouge scores on the Meine DGS Annotated (mDGS) dataset, demonstrating a substantial BLUE-1 improvement of 37.88% in Text to Gloss (T2G) Translation.
arXiv Detail & Related papers (2024-04-17T16:25:19Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - A Simple but Effective Approach to Improve Structured Language Model
Output for Information Extraction [11.165093163378152]
Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions.
This paper introduces an efficient method, G&O, to enhance their structured text generation capabilities.
arXiv Detail & Related papers (2024-02-20T20:42:02Z) - Improving Natural Language Capability of Code Large Language Model [13.639938216171185]
We propose a novel framework, comprising two modules: AttentionExtractor and AttentionCoder.
AttentionExtractor is responsible for extracting key phrases from the user's natural language requirements, and AttentionCoder leverages these extracted phrases to generate target code.
To validate the effectiveness of the framework, we craft a new code generation benchmark, called MultiNL-H, covering five natural languages.
arXiv Detail & Related papers (2024-01-25T15:33:20Z) - Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph.
Existing approaches serialize the output graph as a flat list of nodes and edges.
We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - Exploiting Language Relatedness for Low Web-Resource Language Model
Adaptation: An Indic Languages Study [14.34516262614775]
We argue that relatedness among languages in a language family may be exploited to overcome some of the corpora limitations of LRLs.
We focus on Indian languages, and exploit relatedness along two dimensions: (1) script (since many Indic scripts originated from the Brahmic script) and (2) sentence structure.
arXiv Detail & Related papers (2021-06-07T20:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.