Related papers: Pragmatic Constraint on Distributional Semantics

Pragmatic Constraint on Distributional Semantics

URL: http://arxiv.org/abs/2211.11041v1
Date: Sun, 20 Nov 2022 17:51:06 GMT
Title: Pragmatic Constraint on Distributional Semantics
Authors: Elizaveta Zhemchuzhina and Nikolai Filippov and Ivan P. Yamshchikov
Abstract summary: We show that Zipf-law token distribution emerges irrespective of the chosen tokenization. We show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics.
Score: 6.091096843566857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the limits of language models' statistical learning in the context of Zipf's law. First, we demonstrate that Zipf-law token distribution emerges irrespective of the chosen tokenization. Second, we show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics. Namely, the tokens that have a one-to-one correspondence with one semantic concept have different statistical properties than those with semantic ambiguity. Finally, we demonstrate how these properties interfere with statistical learning procedures motivated by distributional semantics.

Related papers

Disentanglement in Difference: Directly Learning Semantically Disentangled Representations by Maximizing Inter-Factor Differences [6.957804123702956]
Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods.
arXiv Detail & Related papers (2025-02-05T12:30:41Z)
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning [46.43130011147807]
We argue that tokens serving different roles - specifically, reasoning tokens versus boilerplate tokens - differ significantly in importance and learning complexity. We propose a novel Shuffle-Aware Discriminator (SHAD) for adaptive token discrimination. Using SHAD, we propose the Reasoning-highlighted Fine-Tuning (RFT) method, which adaptively emphasizes reasoning tokens during fine-tuning.
arXiv Detail & Related papers (2024-12-19T12:06:24Z)
Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers [14.797001158310092]
We argue that distributional semantics models struggle with truth-conditional reasoning and symbolic processing. Contrary to expectations, we find that LLMs align more closely with human judgements on exact quantifiers versus vague ones.
arXiv Detail & Related papers (2024-10-17T19:28:35Z)
The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline. Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood. The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z)
Beyond Demographic Parity: Redefining Equal Treatment [23.28973277699437]
We show the theoretical properties of our notion of equal treatment and devise a two-sample test based on the AUC of an equal treatment inspector. We release textttexplanationspace, an open-source Python package with methods and tutorials.
arXiv Detail & Related papers (2023-03-14T16:19:44Z)
Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z)
Label Uncertainty Modeling and Prediction for Speech Emotion Recognition using t-Distributions [15.16865739526702]
We propose to model the label distribution using a Student's t-distribution. We derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels. Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results.
arXiv Detail & Related papers (2022-07-25T12:38:20Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Label Distribution Amendment with Emotional Semantic Correlations for Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space. By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated. Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z)
Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance. We propose a general pseudo-labeling framework to address the bias motivated by this observation. We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z)
On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z)
The empirical structure of word frequency distributions [0.0]
I show that first names form natural communicative distributions in most languages. I then show this pattern of findings replicates in communicative distributions of English nouns and verbs.
arXiv Detail & Related papers (2020-01-09T20:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.