Related papers: Assessing Keyness using Permutation Tests

Assessing Keyness using Permutation Tests

URL: http://arxiv.org/abs/2308.13383v1
Date: Fri, 25 Aug 2023 13:52:57 GMT
Title: Assessing Keyness using Permutation Tests
Authors: Thoralf Mildenberger
Abstract summary: We replace the token-by-token sampling model by a model where corpora are samples of documents rather than tokens. We do not need any assumption on how the tokens are organized within or across documents, and the approach works with basically *any* keyness score.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a resampling-based approach for assessing keyness in corpus linguistics based on suggestions by Gries (2006, 2022). Traditional approaches based on hypothesis tests (e.g. Likelihood Ratio) model the copora as independent identically distributed samples of tokens. This model does not account for the often observed uneven distribution of occurences of a word across a corpus. When occurences of a word are concentrated in few documents, large values of LLR and similar scores are in fact much more likely than accounted for by the token-by-token sampling model, leading to false positives. We replace the token-by-token sampling model by a model where corpora are samples of documents rather than tokens, which is much closer to the way corpora are actually assembled. We then use a permutation approach to approximate the distribution of a given keyness score under the null hypothesis of equal frequencies and obtain p-values for assessing significance. We do not need any assumption on how the tokens are organized within or across documents, and the approach works with basically *any* keyness score. Hence, appart from obtaining more accurate p-values for scores like LLR, we can also assess significance for e.g. the logratio which has been proposed as a measure of effect size. An efficient implementation of the proposed approach is provided in the `R` package `keyperm` available from github.

Related papers

Correlation and Navigation in the Vocabulary Key Representation Space of Language Models [33.747872934103334]
We study the effect of the key distribution on the NTP distribution. We show that in the NTP distribution, the few top-ranked tokens are typically accurate. We extend our method to open-ended and chain-of-thought (for reasoning) generation.
arXiv Detail & Related papers (2024-10-03T08:07:55Z)
Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
You should evaluate your language model on marginal likelihood overtokenisations [5.824498637088864]
We argue that language models should be evaluated on their marginal likelihood over tokenisations. We evaluate pretrained English and German language models on both the one-best-tokenisation and marginal perplexities.
arXiv Detail & Related papers (2021-09-06T15:37:02Z)
Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning [30.09715149060206]
Keyphrase Generation (KG) is a classical task for capturing the central idea from a given document. In this paper, we propose a new fine-grained evaluation metric that considers different granularity. For learning more recessive linguistic patterns, we use a pre-trained model (e.g., BERT) to compute the continuous similarity score between predicted keyphrases and target keyphrases.
arXiv Detail & Related papers (2021-04-18T10:13:46Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
Probabilistic Anchor Assignment with IoU Prediction for Object Detection [9.703212439661097]
In object detection, determining which anchors to assign as positive or negative samples, known as anchor assignment, has been revealed as a core procedure that can significantly affect a model's performance. We propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model's learning status.
arXiv Detail & Related papers (2020-07-16T04:26:57Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.