Deep Learning and Natural Language Processing in the Field of Construction
- URL: http://arxiv.org/abs/2501.07911v1
- Date: Tue, 14 Jan 2025 07:53:44 GMT
- Title: Deep Learning and Natural Language Processing in the Field of Construction
- Authors: Rémy Kessler, Nicolas Béchet,
- Abstract summary: We first describe the corpus analysis method to extract terminology from a collection of technical specifications in the field of construction.
We then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology.
Second, we present a machine-learning approach based on various words embedding models and combinations to deal with the detection of hypernyms from the extracted terminology.
- Score: 0.09208007322096533
- License:
- Abstract: This article presents a complete process to extract hypernym relationships in the field of construction using two main steps: terminology extraction and detection of hypernyms from these terms. We first describe the corpus analysis method to extract terminology from a collection of technical specifications in the field of construction. Using statistics and word n-grams analysis, we extract the domain's terminology and then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology. Second, we present a machine-learning approach based on various words embedding models and combinations to deal with the detection of hypernyms from the extracted terminology. Extracted terminology is evaluated using a manual evaluation carried out by 6 experts in the domain, and the hypernym identification method is evaluated with different datasets. The global approach provides relevant and promising results.
Related papers
- Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - A Distributed Automatic Domain-Specific Multi-Word Term Recognition
Architecture using Spark Ecosystem [0.5156484100374059]
We propose a distributed Spark-based architecture to automatically extract domain-specific terms.
We prove empirically the feasibility of our architecture by performing experiments on two real-world datasets.
arXiv Detail & Related papers (2023-05-24T10:05:59Z) - A bilingual approach to specialised adjectives through word embeddings
in the karstology domain [3.92181732547846]
We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings.
The results of the experiment are then thoroughly analysed and categorised into groups of adjectives exhibiting formal or semantic similarity.
arXiv Detail & Related papers (2022-03-31T08:27:15Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Extracting Domain-specific Concepts from Large-scale Linked Open Data [0.0]
The proposed method defines search entities by linking the LOD vocabulary with terms related to the target domain.
The occurrences of common upper-level entities and the chain-of-path relationships are examined to determine the range of conceptual connections in the target domain.
arXiv Detail & Related papers (2021-11-22T10:25:57Z) - Text analysis and deep learning: A network approach [0.0]
We propose a novel method that combines transformer models with network analysis to form a self-referential representation of language use within a corpus of interest.
Our approach produces linguistic relations strongly consistent with the underlying model as well as mathematically well-defined operations on them.
It represents, to the best of our knowledge, the first unsupervised method to extract semantic networks directly from deep language models.
arXiv Detail & Related papers (2021-10-08T14:18:36Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - Hierarchical Learning Using Deep Optimum-Path Forest [55.60116686945561]
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses.
In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW.
arXiv Detail & Related papers (2021-02-18T13:02:40Z) - Introducing Syntactic Structures into Target Opinion Word Extraction
with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction.
We also introduce a novel regularization technique to improve the performance of the deep learning models.
The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z) - Distributional semantic modeling: a revised technique to train term/word
vector space models applying the ontology-related approach [36.248702416150124]
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings)
Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
arXiv Detail & Related papers (2020-03-06T18:27:39Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.