Accessibility and Trajectory-Based Text Characterization
- URL: http://arxiv.org/abs/2201.06665v1
- Date: Mon, 17 Jan 2022 23:33:11 GMT
- Title: Accessibility and Trajectory-Based Text Characterization
- Authors: B\'arbara C. e Souza and Filipi N. Silva and Henrique F. de Arruda and
Luciano da F. Costa and Diego R. Amancio
- Abstract summary: In particular, texts are characterized by a hierarchical structure that can be approached by using multi-scale concepts and methods.
We adopt an extension to the mesoscopic approach to represent text narratives, in which only the recurrent relationships among tagged parts of speech are considered.
The characterization of the texts was then achieved by considering scale-dependent complementary methods: accessibility, symmetry and recurrence signatures.
- Score: 0.6912244027050454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several complex systems are characterized by presenting intricate
characteristics extending along many scales. These characterizations are used
in various applications, including text classification, better understanding of
diseases, and comparison between cities, among others. In particular, texts are
also characterized by a hierarchical structure that can be approached by using
multi-scale concepts and methods. The present work aims at developing these
possibilities while focusing on mesoscopic representations of networks. More
specifically, we adopt an extension to the mesoscopic approach to represent
text narratives, in which only the recurrent relationships among tagged parts
of speech are considered to establish connections among sequential pieces of
text (e.g., paragraphs). The characterization of the texts was then achieved by
considering scale-dependent complementary methods: accessibility, symmetry and
recurrence signatures. In order to evaluate the potential of these concepts and
methods, we approached the problem of distinguishing between literary genres
(fiction and non-fiction). A set of 300 books organized into the two genres was
considered and were compared by using the aforementioned approaches. All the
methods were capable of differentiating to some extent between the two genres.
The accessibility and symmetry reflected the narrative asymmetries, while the
recurrence signature provide a more direct indication about the non-sequential
semantic connections taking place along the narrative.
Related papers
- Complex systems approach to natural language [0.0]
Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
arXiv Detail & Related papers (2024-01-05T12:01:26Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - A Comparative Study of Sentence Embedding Models for Assessing Semantic
Variation [0.0]
We compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature.
We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
arXiv Detail & Related papers (2023-08-08T23:31:10Z) - Tragic and Comical Networks. Clustering Dramatic Genres According to
Structural Properties [0.0]
A growing tradition in the joint field of network studies and drama history produces interpretations from the character networks of the plays.
Our aim is to create a method that is able to cluster texts with similar structures on the basis of the play's well-interpretable and simple properties.
Finding these features is the most important part of our research, as well as establishing the appropriate statistical procedure to calculate the similarities between the texts.
arXiv Detail & Related papers (2023-02-16T12:36:16Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Combining Pre-trained Word Embeddings and Linguistic Features for
Sequential Metaphor Identification [12.750941606061877]
We tackle the problem of identifying metaphors in text, treated as a sequence tagging task.
The pre-trained word embeddings GloVe, ELMo and BERT have individually shown good performance on sequential metaphor identification.
We show that leveraging GloVe, ELMo and feature-based BERT can significantly outperform any single word embedding method and the combination of the two embeddings.
arXiv Detail & Related papers (2021-04-07T17:43:05Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Contextual Modulation for Relation-Level Metaphor Identification [3.2619536457181075]
We introduce a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations.
In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features.
We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets.
arXiv Detail & Related papers (2020-10-12T12:07:02Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.