Frequency effects in Linear Discriminative Learning
- URL: http://arxiv.org/abs/2306.11044v2
- Date: Mon, 18 Mar 2024 10:36:05 GMT
- Title: Frequency effects in Linear Discriminative Learning
- Authors: Maria Heitmeier, Yu-Ying Chuang, Seth D. Axen, R. Harald Baayen,
- Abstract summary: We show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL)
FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly.
Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.
- Score: 0.36248657646376703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.
Related papers
- What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length [61.71625297655583]
We show that MORCELA outperforms a commonly used linking theory for acceptability.
Larger models require a lower relative degree of adjustment for unigram frequency.
Our analysis shows that larger LMs' lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.
arXiv Detail & Related papers (2024-11-04T19:05:49Z) - Zipfian Whitening [7.927385005964994]
Most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform.
In reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law.
We show that simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance.
arXiv Detail & Related papers (2024-11-01T15:40:19Z) - Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing [6.726629754291751]
We introduce a method for quantifying the frequency bias of a language model.
We then present a method for reducing the frequency bias of a language model by inducing a syntactic prior over token representations during pre-training.
This approach results in better performance on infrequent English tokens and a decrease in anisotropy.
arXiv Detail & Related papers (2024-10-15T10:09:57Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - What do neural networks learn in image classification? A frequency
shortcut perspective [3.9858496473361402]
This study empirically investigates the learning dynamics of frequency shortcuts in neural networks (NNs)
We show that NNs tend to find simple solutions for classification, and what they learn first during training depends on the most distinctive frequency characteristics.
We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts.
arXiv Detail & Related papers (2023-07-19T08:34:25Z) - Unsupervised Sentence Representation Learning with Frequency-induced
Adversarial Tuning and Incomplete Sentence Filtering [14.085826003974187]
We propose Sentence Representation Learning with Frequency-induced Adversarial tuning and Incomplete sentence filtering (SLT-FAI)
PLM is sensitive to the frequency information of words from their pre-training corpora, resulting in anisotropic embedding space.
We incorporate an information discriminator to distinguish the embeddings of original sentences and incomplete sentences by randomly masking several low-frequency words.
arXiv Detail & Related papers (2023-05-15T13:59:23Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Robust Learning with Frequency Domain Regularization [1.370633147306388]
We introduce a new regularization method by constraining the frequency spectra of the filter of the model.
We demonstrate the effectiveness of our regularization by (1) defensing to adversarial perturbations; (2) reducing the generalization gap in different architecture; and (3) improving the generalization ability in transfer learning scenario without fine-tune.
arXiv Detail & Related papers (2020-07-07T07:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.