Comparison between the Structures of Word Co-occurrence and Word Similarity Networks for Ill-formed and Well-formed Texts in Taiwan Mandarin
- URL: http://arxiv.org/abs/2408.09404v1
- Date: Sun, 18 Aug 2024 08:30:16 GMT
- Title: Comparison between the Structures of Word Co-occurrence and Word Similarity Networks for Ill-formed and Well-formed Texts in Taiwan Mandarin
- Authors: Po-Hsuan Huang, Hsuan-Lei Shao,
- Abstract summary: This study investigates the structure of word co-occurrence networks based on Taiwan Mandarin ill-formed internet forum posts.
It seeks to find out whether the three properties (scale-free, small-world, and disassortative) for ill-formed and well-formed texts are universal among different languages.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The study of word co-occurrence networks has attracted the attention of researchers due to their potential significance as well as applications. Understanding the structure of word co-occurrence networks is therefore important to fully realize their significance and usages. In past studies, word co-occurrence networks built on well-formed texts have been found to possess certain characteristics, including being small-world, following a two-regime power law distribution, and being generally disassortative. On the flip side, past studies have found that word co-occurrence networks built from ill-formed texts such as microblog posts may behave differently from those built from well-formed documents. While both kinds of word co-occurrence networks are small-world and disassortative, word co-occurrence networks built from ill-formed texts are scale-free and follow the power law distribution instead of the two-regime power law distribution. However, since past studies on the behavior of word co-occurrence networks built from ill-formed texts only investigated English, the universality of such characteristics remains to be seen among different languages. In addition, it is yet to be investigated whether there could be possible similitude/differences between word co-occurrence networks and other potentially comparable networks. This study therefore investigates and compares the structure of word co-occurrence networks and word similarity networks based on Taiwan Mandarin ill-formed internet forum posts and compare them with those built with well-formed judicial judgments, and seeks to find out whether the three aforementioned properties (scale-free, small-world, and disassortative) for ill-formed and well-formed texts are universal among different languages and between word co-occurrence and word similarity networks.
Related papers
- Topology of Syntax Networks across Languages [0.0]
This thesis investigates the structure and properties of syntax networks.
It will try to find clusters/phylogenies of languages that share similar network features.
Results across different languages will also be compared in an attempt to discover universally preserved structural patterns.
arXiv Detail & Related papers (2025-03-09T18:47:17Z) - Complex systems approach to natural language [0.0]
Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
arXiv Detail & Related papers (2024-01-05T12:01:26Z) - Topological properties and organizing principles of semantic networks [3.8462776107938317]
We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages.
We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions.
In some networks the connections are similarity-based, while in others the connections are more complementarity-based.
arXiv Detail & Related papers (2023-04-24T11:12:21Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Sentiment and structure in word co-occurrence networks on Twitter [1.181206257787103]
We explore the relationship between context and happiness scores in political tweets using word co-occurrence networks.
neutral words are found to be dominant and most words, regardless of polarity, tend to co-occur with neutral words.
Although we observe no clear relationship between happiness scores and co-occurrence at the node or edge level, a community-centric approach can isolate themes of competing sentiments in a corpus.
arXiv Detail & Related papers (2021-10-01T18:00:02Z) - Detecting Communities in Heterogeneous Multi-Relational Networks:A
Message Passing based Approach [89.19237792558687]
Community is a common characteristic of networks including social networks, biological networks, computer and information networks.
We propose an efficient message passing based algorithm to simultaneously detect communities for all homogeneous networks.
arXiv Detail & Related papers (2020-04-06T17:36:24Z) - Using word embeddings to improve the discriminability of co-occurrence
text networks [0.1611401281366893]
We investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems.
Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText.
arXiv Detail & Related papers (2020-03-13T13:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.