Related papers: Spiral Language Modeling

Spiral Language Modeling

URL: http://arxiv.org/abs/2112.10543v1
Date: Mon, 20 Dec 2021 14:08:38 GMT
Title: Spiral Language Modeling
Authors: Yong Cao, Yukun Feng, Shaohui Kuang, Gu Xu
Abstract summary: Spiral Language Modeling (SLM) is a general approach that enables one to construct natural language sentences beyond the L2R and R2L order. SLM allows one to form natural language text by starting from an arbitrary token inside the result text. Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase.
Score: 5.816641790933646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language written order is not essential for text generation. In this paper, we propose Spiral Language Modeling (SLM), a general approach that enables one to construct natural language sentences beyond the L2R and R2L order. SLM allows one to form natural language text by starting from an arbitrary token inside the result text and expanding the rest tokens around the selected ones. It makes the decoding order a new optimization objective besides the language model perplexity, which further improves the diversity and quality of the generated text. Furthermore, SLM makes it possible to manipulate the text construction process by selecting a proper starting token. SLM also introduces generation orderings as additional regularization to improve model robustness in low-resource scenarios. Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase comparing to the conventional L2R decoding approach.

Related papers

Unnatural Languages Are Not Bugs but Features for LLMs [92.8332103170009]
Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts. We present a systematic investigation challenging this perception, demonstrating that unnatural languages contain latent features usable by models.
arXiv Detail & Related papers (2025-03-02T12:10:17Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking [5.174301428591665]
Large Language Models (LLMs) are widely used for tasks such as natural language and code generation. Their outputs often suffer from issues like hallucination, toxicity, and incorrect results. Current libraries for structured LLM generation rely on left-to-right decoding without support for backtracking. IterGen enables users to move both forward and backward within the generated output based on grammar symbols.
arXiv Detail & Related papers (2024-10-09T16:21:38Z)
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts [50.40191599304911]
We investigate whether transliteration is also effective in improving LLMs' performance for low-resource languages written in non-Latin scripts. We propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both. Our findings show that the effectiveness of transliteration varies by task type and model size.
arXiv Detail & Related papers (2024-07-02T14:51:20Z)
ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts. In this paper, we identify the common characteristics shared by these models. We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z)
Select and Reorder: A Novel Approach for Neural Sign Language Production [35.35777909051466]
Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation. This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR) We achieve state-of-the-art BLEU and Rouge scores on the Meine DGS Annotated (mDGS) dataset, demonstrating a substantial BLUE-1 improvement of 37.88% in Text to Gloss (T2G) Translation.
arXiv Detail & Related papers (2024-04-17T16:25:19Z)
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs. We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files. Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language. Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z)
A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction [11.165093163378152]
Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions. This paper introduces an efficient method, G&O, to enhance their structured text generation capabilities.
arXiv Detail & Related papers (2024-02-20T20:42:02Z)
Improving Natural Language Capability of Code Large Language Model [13.639938216171185]
We propose a novel framework, comprising two modules: AttentionExtractor and AttentionCoder. AttentionExtractor is responsible for extracting key phrases from the user's natural language requirements, and AttentionCoder leverages these extracted phrases to generate target code. To validate the effectiveness of the framework, we craft a new code generation benchmark, called MultiNL-H, covering five natural languages.
arXiv Detail & Related papers (2024-01-25T15:33:20Z)
Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. Existing approaches serialize the output graph as a flat list of nodes and edges. We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems. We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences. Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study [14.34516262614775]
We argue that relatedness among languages in a language family may be exploited to overcome some of the corpora limitations of LRLs. We focus on Indian languages, and exploit relatedness along two dimensions: (1) script (since many Indic scripts originated from the Brahmic script) and (2) sentence structure.
arXiv Detail & Related papers (2021-06-07T20:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.