Human Languages with Greater Information Density Increase Communication
Speed, but Decrease Conversation Breadth
- URL: http://arxiv.org/abs/2112.08491v2
- Date: Fri, 29 Sep 2023 14:28:17 GMT
- Title: Human Languages with Greater Information Density Increase Communication
Speed, but Decrease Conversation Breadth
- Authors: Pedro Aceves and James A. Evans
- Abstract summary: We show that there is broad variation in how densely languages encode information into their words.
Second, we show that this language information density is associated with a denser configuration of semantic information.
Finally, we trace the relationship between language information density and patterns of communication.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human languages vary widely in how they encode information within
circumscribed semantic domains (e.g., time, space, color, human body parts and
activities), but little is known about the global structure of semantic
information and nothing about its relation to human communication. We first
show that across a sample of ~1,000 languages, there is broad variation in how
densely languages encode information into their words. Second, we show that
this language information density is associated with a denser configuration of
semantic information. Finally, we trace the relationship between language
information density and patterns of communication, showing that informationally
denser languages tend toward (1) faster communication, but (2) conceptually
narrower conversations within which topics of conversation are discussed at
greater depth. These results highlight an important source of variation across
the human communicative channel, revealing that the structure of language
shapes the nature and texture of human engagement, with consequences for human
behavior across levels of society.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - Linguistic Structure from a Bottleneck on Sequential Information Processing [5.850665541267672]
We show that natural-language-like systematicity arises in codes that are constrained by predictive information.
We show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics.
arXiv Detail & Related papers (2024-05-20T15:25:18Z) - Unveiling the pressures underlying language learning and use in neural networks, large language models, and humans: Lessons from emergent machine-to-machine communication [5.371337604556311]
We review three cases where mismatches between the emergent linguistic behavior of neural agents and humans were resolved.
We identify key pressures at play for language learning and emergence: communicative success, production effort, learnability, and other psycho-/sociolinguistic factors.
arXiv Detail & Related papers (2024-03-21T14:33:34Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - Bridging Nations: Quantifying the Role of Multilinguals in Communication
on Social Media [14.646734380673648]
We quantify multilingual users' structural role and communication influence in cross-lingual information exchange.
Having a multilingual network neighbor increases monolinguals' odds of sharing domains and hashtags from another language 16-fold and 4-fold, respectively.
By highlighting information exchange across borders, this work sheds light on a crucial component of how information and ideas spread around the world.
arXiv Detail & Related papers (2023-04-07T18:01:25Z) - Joint processing of linguistic properties in brains and language models [14.997785690790032]
We investigate the correspondence between the detailed processing of linguistic information by the human brain versus language models.
We find that elimination of specific linguistic properties results in a significant decrease in brain alignment.
These findings provide clear evidence for the role of specific linguistic information in the alignment between brain and language models.
arXiv Detail & Related papers (2022-12-15T19:13:42Z) - Representing Affect Information in Word Embeddings [5.378735006566249]
We investigated whether and how the affect meaning of a word is encoded in word embeddings pre-trained in large neural networks.
The embeddings varied in being static or contextualized, and how much affect specific information was prioritized during the pre-training and fine-tuning phase.
arXiv Detail & Related papers (2022-09-21T18:16:33Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.