Related papers: Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach

Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach

URL: http://arxiv.org/abs/2003.03350v1
Date: Fri, 6 Mar 2020 18:27:39 GMT
Title: Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
Authors: Oleksandr Palagin, Vitalii Velychko, Kyrylo Malakhov and Oleksandr Shchurov
Abstract summary: We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
Score: 36.248702416150124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.

Related papers

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity. We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT. The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models. Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z)
Constructing Word-Context-Coupled Space Aligned with Associative Knowledge Relations for Interpretable Language Modeling [0.0]
The black-box structure of the deep neural network in pre-trained language models seriously limits the interpretability of the language modeling process. A Word-Context-Coupled Space (W2CSpace) is proposed by introducing the alignment processing between uninterpretable neural representation and interpretable statistical logic. Our language model can achieve better performance and highly credible interpretable ability compared to related state-of-the-art methods.
arXiv Detail & Related papers (2023-05-19T09:26:02Z)
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. We introduce a new Compositional Temporal Grounding task and construct two new dataset splits. We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z)
Imitation Learning-based Implicit Semantic-aware Communication Networks: Multi-layer Representation and Collaborative Reasoning [68.63380306259742]
Despite its promising potential, semantic communications and semantic-aware networking are still at their infancy. We propose a novel reasoning-based implicit semantic-aware communication network architecture that allows multiple tiers of CDC and edge servers to collaborate. We introduce a new multi-layer representation of semantic information taking into consideration both the hierarchical structure of implicit semantics as well as the personalized inference preference of individual users.
arXiv Detail & Related papers (2022-10-28T13:26:08Z)
Pretraining on Interactions for Learning Grounded Affordance Representations [22.290431852705662]
We train a neural network to predict objects' trajectories in a simulated interaction. We show that our network's latent representations differentiate between both observed and unobserved affordances. Our results suggest a way in which modern deep learning approaches to grounded language learning can be integrated with traditional formal semantic notions of lexical representations.
arXiv Detail & Related papers (2022-07-05T19:19:53Z)
Graph Adaptive Semantic Transfer for Cross-domain Sentiment Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z)
Text analysis and deep learning: A network approach [0.0]
We propose a novel method that combines transformer models with network analysis to form a self-referential representation of language use within a corpus of interest. Our approach produces linguistic relations strongly consistent with the underlying model as well as mathematically well-defined operations on them. It represents, to the best of our knowledge, the first unsupervised method to extract semantic networks directly from deep language models.
arXiv Detail & Related papers (2021-10-08T14:18:36Z)
Semantic Representation and Inference for NLP [2.969705152497174]
This thesis investigates the use of deep learning for novel semantic representation and inference. We contribute the largest publicly available dataset of real-life factual claims for the purpose of automatic claim verification. We operationalize the compositionality of a phrase contextually by enriching the phrase representation with external word embeddings and knowledge graphs.
arXiv Detail & Related papers (2021-06-15T13:22:48Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.