Complex systems approach to natural language
- URL: http://arxiv.org/abs/2401.02772v1
- Date: Fri, 5 Jan 2024 12:01:26 GMT
- Title: Complex systems approach to natural language
- Authors: Tomasz Stanisz, Stanis{\l}aw Dro\.zd\.z, Jaros{\l}aw Kwapie\'n
- Abstract summary: Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The review summarizes the main methodological concepts used in studying
natural language from the perspective of complexity science and documents their
applicability in identifying both universal and system-specific features of
language in its written representation. Three main complexity-related research
trends in quantitative linguistics are covered. The first part addresses the
issue of word frequencies in texts and demonstrates that taking punctuation
into consideration restores scaling whose violation in the Zipf's law is often
observed for the most frequent words. The second part introduces methods
inspired by time series analysis, used in studying various kinds of
correlations in written texts. The related time series are generated on the
basis of text partition into sentences or into phrases between consecutive
punctuation marks. It turns out that these series develop features often found
in signals generated by complex systems, like long-range correlations or
(multi)fractal structures. Moreover, it appears that the distances between
punctuation marks comply with the discrete variant of the Weibull distribution.
In the third part, the application of the network formalism to natural language
is reviewed, particularly in the context of the so-called word-adjacency
networks. Parameters characterizing topology of such networks can be used for
classification of texts, for example, from a stylometric perspective. Network
approach can also be applied to represent the organization of word
associations. Structure of word-association networks turns out to be
significantly different from that observed in random networks, revealing
genuine properties of language. Finally, punctuation seems to have a
significant impact not only on the language's information-carrying ability but
also on its key statistical properties, hence it is recommended to consider
punctuation marks on a par with words.
Related papers
- Acoustic characterization of speech rhythm: going beyond metrics with
recurrent neural networks [0.0]
We train a recurrent neural network on a language identification task over a large database of speech recordings in 21 languages.
The network was able to identify the language of 10-second recordings in 40% of the cases, and the language was in the top-3 guesses in two-thirds of the cases.
arXiv Detail & Related papers (2024-01-22T09:49:44Z) - Lexical Complexity Prediction: An Overview [13.224233182417636]
The occurrence of unknown words in texts significantly hinders reading comprehension.
computational modelling has been applied to identify complex words in texts and substitute them for simpler alternatives.
We present an overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data.
arXiv Detail & Related papers (2023-03-08T19:35:08Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Universal versus system-specific features of punctuation usage patterns
in~major Western~languages [0.0]
In written texts punctuation can be considered one of its manifestations.
This study is based on a large corpus of world-famous and representative literary texts in seven major Western languages.
arXiv Detail & Related papers (2022-12-21T16:52:10Z) - Universality and diversity in word patterns [0.0]
We present an analysis of lexical statistical connections for eleven major languages.
We find that the diverse manners that languages utilize to express word relations give rise to unique pattern distributions.
arXiv Detail & Related papers (2022-08-23T20:03:27Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence
Semantic Matching [66.65398852962177]
We develop a novel Dynamic Re-read Network (DRr-Net) for sentence semantic matching.
We extend DRr-Net to Locally-Aware Dynamic Re-read Attention Net (LadRa-Net)
Experiments on two popular sentence semantic matching tasks demonstrate that DRr-Net can significantly improve the performance of sentence semantic matching.
arXiv Detail & Related papers (2021-08-06T02:07:04Z) - Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme.
The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z) - Testing the Quantitative Spacetime Hypothesis using Artificial Narrative
Comprehension (II) : Establishing the Geometry of Invariant Concepts, Themes,
and Namespaces [0.0]
This study contributes to an ongoing application of the Semantic Spacetime Hypothesis, and demonstrates the unsupervised analysis of narrative texts.
Data streams are parsed and fractionated into small constituents, by multiscale interferometry, in the manner of bioinformatic analysis.
Fragments of the input act as symbols in a hierarchy of alphabets that define new effective languages at each scale.
arXiv Detail & Related papers (2020-09-23T11:19:17Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.