Word frequency-rank relationship in tagged texts
- URL: http://arxiv.org/abs/2102.10992v1
- Date: Sun, 7 Feb 2021 15:17:51 GMT
- Title: Word frequency-rank relationship in tagged texts
- Authors: A. Chacoma, D. H. Zanette
- Abstract summary: We analyze the frequency-rank relationship in sub-vocabularies corresponding to three different grammatical classes.
This results point to the fact that frequency-rank relationships may reflect linguistic features associated with grammatical function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the frequency-rank relationship in sub-vocabularies corresponding
to three different grammatical classes ({\em nouns}, {\em verbs}, and {\em
others}) in a collection of literary works in English, whose words have been
automatically tagged according to their grammatical role. Comparing with a null
hypothesis which assumes that words belonging to each class are uniformly
distributed across the frequency-ranked vocabulary of the whole work, we
disclose statistically significant differences between the three classes. This
results point to the fact that frequency-rank relationships may reflect
linguistic features associated with grammatical function.
Related papers
- Complex systems approach to natural language [0.0]
Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
arXiv Detail & Related papers (2024-01-05T12:01:26Z) - Unsupervised Mapping of Arguments of Deverbal Nouns to Their
Corresponding Verbal Labels [52.940886615390106]
Deverbal nouns are verbs commonly used in written English texts to describe events or actions, as well as their arguments.
The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation.
We propose to adopt a more syntactic approach, which maps the arguments of deverbal nouns to the corresponding verbal construction.
arXiv Detail & Related papers (2023-06-24T10:07:01Z) - Investigating the Frequency Distortion of Word Embeddings and Its Impact
on Bias Metrics [2.1374208474242815]
We systematically study the association between frequency and semantic similarity in several static word embeddings.
We find that Skip-gram, GloVe and FastText embeddings tend to produce higher semantic similarity between high-frequency words than between other frequency combinations.
arXiv Detail & Related papers (2022-11-15T15:11:06Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Universality and diversity in word patterns [0.0]
We present an analysis of lexical statistical connections for eleven major languages.
We find that the diverse manners that languages utilize to express word relations give rise to unique pattern distributions.
arXiv Detail & Related papers (2022-08-23T20:03:27Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z) - An exploration of the encoding of grammatical gender in word embeddings [0.6461556265872973]
The study of grammatical gender based on word embeddings can give insight into discussions on how grammatical genders are determined.
It is found that there is an overlap in how grammatical gender is encoded in Swedish, Danish, and Dutch embeddings.
arXiv Detail & Related papers (2020-08-05T06:01:46Z) - On the Relationships Between the Grammatical Genders of Inanimate Nouns
and Their Co-Occurring Adjectives and Verbs [57.015586483981885]
We use large-scale corpora in six different gendered languages.
We find statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, indirect objects, and as subjects.
arXiv Detail & Related papers (2020-05-03T22:49:44Z) - Heaps' law and Heaps functions in tagged texts: Evidences of their
linguistic relevance [0.0]
We study the relationship between vocabulary size and text length in a corpus of $75$ literary works in English.
We analyze the progressive appearance of new words of each tag along each individual text.
arXiv Detail & Related papers (2020-01-07T17:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.