Information Theory of Meaningful Communication
- URL: http://arxiv.org/abs/2411.12728v1
- Date: Tue, 19 Nov 2024 18:51:23 GMT
- Title: Information Theory of Meaningful Communication
- Authors: Doron Sivan, Misha Tsodyks,
- Abstract summary: In Shannon's seminal paper, entropy of printed English, treated as a stationary process, was estimated to be roughly 1 bit per character.
In this study, we show that one can leverage recently developed large language models to quantify information communicated in meaningful narratives in terms of bits of meaning per clause.
- Score: 0.0
- License:
- Abstract: In Shannon's seminal paper, entropy of printed English, treated as a stationary stochastic process, was estimated to be roughly 1 bit per character. However, considered as a means of communication, language differs considerably from its printed form: (i) the units of information are not characters or even words but clauses, i.e. shortest meaningful parts of speech; and (ii) what is transmitted is principally the meaning of what is being said or written, while the precise phrasing that was used to communicate the meaning is typically ignored. In this study, we show that one can leverage recently developed large language models to quantify information communicated in meaningful narratives in terms of bits of meaning per clause.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Machine Translation to Control Formality Features in the Target Language [0.9208007322096532]
This research explores how machine learning methods are used to translate from English to languages with formality.
It was done by training a bilingual model in a formality-controlled setting and comparing its performance with a pre-trained multilingual model.
We evaluate the official formality accuracy(ACC) by comparing the predicted masked tokens with the ground truth.
arXiv Detail & Related papers (2023-11-22T15:42:51Z) - Evaluation of Automatically Constructed Word Meaning Explanations [0.0]
We present a new tool that derives explanations automatically based on collective information from very large corpora.
We show that the presented approach allows to create explanations that contain data useful for understanding the word meaning in approximately 90% of cases.
arXiv Detail & Related papers (2023-02-27T09:47:55Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Semantic Communications: Principles and Challenges [59.13318519076149]
This article provides an overview on semantic communications.
After a brief review on Shannon information theory, we discuss semantic communications with theory, frameworks, and system design enabled by deep learning.
arXiv Detail & Related papers (2021-12-30T16:32:00Z) - Do Language Embeddings Capture Scales? [54.1633257459927]
We show that pretrained language models capture a significant amount of information about the scalar magnitudes of objects.
We identify contextual information in pre-training and numeracy as two key factors affecting their performance.
arXiv Detail & Related papers (2020-10-11T21:11:09Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.