Local Compositional Complexity: How to Detect a Human-readable Messsage
- URL: http://arxiv.org/abs/2501.03664v1
- Date: Tue, 07 Jan 2025 10:04:01 GMT
- Title: Local Compositional Complexity: How to Detect a Human-readable Messsage
- Authors: Louis Mahon,
- Abstract summary: We focus on a particular sense of complexity that is high if the data is structured in a way that could serve to communicate a message.
We describe a general framework for measuring data complexity based on dividing the shortest description of the data into a structured and an unstructured portion.
We derive a more precise and computable definition geared towards human communication, by proposing local compositionality as an appropriate specific structure.
- Score: 0.0
- License:
- Abstract: Data complexity is an important concept in the natural sciences and related areas, but lacks a rigorous and computable definition. In this paper, we focus on a particular sense of complexity that is high if the data is structured in a way that could serve to communicate a message. In this sense, human speech, written language, drawings, diagrams and photographs are high complexity, whereas data that is close to uniform throughout or populated by random values is low complexity. We describe a general framework for measuring data complexity based on dividing the shortest description of the data into a structured and an unstructured portion, and taking the size of the former as the complexity score. We outline an application of this framework in statistical mechanics that may allow a more objective characterisation of the macrostate and entropy of a physical system. Then, we derive a more precise and computable definition geared towards human communication, by proposing local compositionality as an appropriate specific structure. We demonstrate experimentally that this method can distinguish meaningful signals from noise or repetitive signals in auditory, visual and text domains, and could potentially help determine whether an extra-terrestrial signal contained a message.
Related papers
- How compositional generalization and creativity improve as diffusion models are trained [82.08869888944324]
How many samples do generative models need to learn the composition rules, so as to produce a number of novel data?
We consider diffusion models trained on simple context-free grammars - tree-like graphical models used to represent the structure of data such as language and images.
We demonstrate that diffusion models learn compositional rules with the sample complexity required for clustering features with statistically similar context, a process similar to the word2vec.
arXiv Detail & Related papers (2025-02-17T18:06:33Z) - Unsupervised detection of semantic correlations in big data [47.201377047286215]
We present a method to detect semantic correlations in high-dimensional data represented as binary numbers.
We estimate the binary intrinsic dimension of a dataset, which quantifies the minimum number of independent coordinates needed to describe the data.
The proposed algorithm is largely insensitive to the so-called curse of dimensionality, and can therefore be used in big data analysis.
arXiv Detail & Related papers (2024-11-04T14:37:07Z) - Multi-scale structural complexity as a quantitative measure of visual complexity [1.3499500088995464]
We suggest adopting the multi-scale structural complexity (MSSC) measure, an approach that defines structural complexity of an object as the amount of dissimilarities between distinct scales in its hierarchical organization.
We demonstrate that MSSC correlates with subjective complexity on par with other computational complexity measures, while being more intuitive by definition, consistent across categories of images, and easier to compute.
arXiv Detail & Related papers (2024-08-07T20:26:35Z) - Linguistic Structure from a Bottleneck on Sequential Information Processing [5.850665541267672]
We show that natural-language-like systematicity arises in codes that are constrained by predictive information.
We show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics.
arXiv Detail & Related papers (2024-05-20T15:25:18Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Semantic-Native Communication: A Simplicial Complex Perspective [50.099494681671224]
We study semantic communication from a topological space perspective.
A transmitter first maps its data into a $k$-order simplicial complex and then learns its high-order correlations.
The receiver decodes the structure and infers the missing or distorted data.
arXiv Detail & Related papers (2022-10-30T22:33:44Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z) - Testing the Quantitative Spacetime Hypothesis using Artificial Narrative
Comprehension (II) : Establishing the Geometry of Invariant Concepts, Themes,
and Namespaces [0.0]
This study contributes to an ongoing application of the Semantic Spacetime Hypothesis, and demonstrates the unsupervised analysis of narrative texts.
Data streams are parsed and fractionated into small constituents, by multiscale interferometry, in the manner of bioinformatic analysis.
Fragments of the input act as symbols in a hierarchy of alphabets that define new effective languages at each scale.
arXiv Detail & Related papers (2020-09-23T11:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.