Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence
- URL: http://arxiv.org/abs/2405.15750v2
- Date: Tue, 6 Aug 2024 22:29:11 GMT
- Title: Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence
- Authors: Abhinav Patil, Jaap Jumelet, Yu Ying Chiu, Andy Lapastora, Peter Shen, Lexie Wang, Clevis Willrich, Shane Steinert-Threlkeld,
- Abstract summary: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out.
We apply the method to both LSTM and Transformer LMs, developing filtered corpora that target a wide range of linguistic phenomena.
Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures.
- Score: 5.597365985417171
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpora that target a wide range of linguistic phenomena. Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures, suggesting that they are capable of generalizing from indirect evidence.
Related papers
- From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - FiLM: Fill-in Language Models for Any-Order Generation [71.42044325886194]
Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order.
During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs.
FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
arXiv Detail & Related papers (2023-10-15T19:37:39Z) - mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models [57.225289079198454]
We propose mPLMSim, a language similarity measure that induces the similarities across languages from mPLMs using multi-parallel corpora.
Our study shows that mPLM-Sim exhibits moderately high correlations with linguistic similarity measures, such as lexico, genealogical language family, and geographical sprachbund.
We further investigate whether mPLMSim is effective for zero-shot cross-lingual transfer by conducting experiments on both low-level syntactic tasks and high-level semantic tasks.
arXiv Detail & Related papers (2023-05-23T04:44:26Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Cross-lingual Adaption Model-Agnostic Meta-Learning for Natural Language
Understanding [24.66203356497508]
We propose XLA-MAML, which performs direct cross-lingual adaption in the meta-learning stage.
We conduct zero-shot and few-shot experiments on Natural Language Inference and Question Answering.
arXiv Detail & Related papers (2021-11-10T16:53:50Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - Language Models Use Monotonicity to Assess NPI Licensing [8.856422030608188]
We investigate the semantic knowledge of language models (LMs)
We focus on whether these LMs create categories of linguistic environments based on their semantic monotonicity properties, and whether these categories play a similar role in LMs as in human language understanding.
arXiv Detail & Related papers (2021-05-28T13:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.