Unsupervised Technical Domain Terms Extraction using Term Extractor
- URL: http://arxiv.org/abs/2101.09015v1
- Date: Fri, 22 Jan 2021 09:24:09 GMT
- Title: Unsupervised Technical Domain Terms Extraction using Term Extractor
- Authors: Suman Dowlagar, Radhika Mamidi
- Abstract summary: The goal of terminology extraction is to extract relevant words or phrases from a given corpus automatically.
This paper focuses on the unsupervised automated domain term extraction method that considers chunking, preprocessing, and ranking domain-specific terms.
- Score: 9.23545668304066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Terminology extraction, also known as term extraction, is a subtask of
information extraction. The goal of terminology extraction is to extract
relevant words or phrases from a given corpus automatically. This paper focuses
on the unsupervised automated domain term extraction method that considers
chunking, preprocessing, and ranking domain-specific terms using relevance and
cohesion functions for ICON 2020 shared task 2: TermTraction.
Related papers
- Extracting domain-specific terms using contextual word embeddings [2.7941582470640784]
This paper proposes a novel machine learning approach to terminology extraction.
It combines features from traditional term extraction systems with novel contextual features derived from contextual word embeddings.
Our approach provides significant improvements in terms of F1 score over the previous state-of-the-art.
arXiv Detail & Related papers (2025-02-24T16:06:35Z) - Deep Learning and Natural Language Processing in the Field of Construction [0.09208007322096533]
We first describe the corpus analysis method to extract terminology from a collection of technical specifications in the field of construction.
We then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology.
Second, we present a machine-learning approach based on various words embedding models and combinations to deal with the detection of hypernyms from the extracted terminology.
arXiv Detail & Related papers (2025-01-14T07:53:44Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Terminology-Aware Translation with Constrained Decoding and Large
Language Model Prompting [11.264272119913311]
We submit to the WMT 2023 terminology translation task.
We adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts.
Results show that our terminology-aware model learns to incorporate terminologies effectively.
arXiv Detail & Related papers (2023-10-09T16:08:23Z) - A Distributed Automatic Domain-Specific Multi-Word Term Recognition
Architecture using Spark Ecosystem [0.5156484100374059]
We propose a distributed Spark-based architecture to automatically extract domain-specific terms.
We prove empirically the feasibility of our architecture by performing experiments on two real-world datasets.
arXiv Detail & Related papers (2023-05-24T10:05:59Z) - Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural
Machine Translation [11.902884131696783]
This paper presents a plug-and-play approach for translation with terminology constraints.
We propose Cascade Beam Search, a terminology-forcing approach that requires no training.
We evaluate the performance of our approach by competing against the top submissions of the WMT21 terminology translation task.
arXiv Detail & Related papers (2023-05-23T21:48:02Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Open Relation and Event Type Discovery with Type Abstraction [80.92395639632383]
We introduce the idea of type abstraction, where the model is prompted to generalize and name the type.
We use the similarity between inferred names to induce clusters.
Our experiments on multiple relation extraction and extraction event datasets consistently show the advantage of our type abstraction approach.
arXiv Detail & Related papers (2022-11-30T23:47:49Z) - Extracting Domain-specific Concepts from Large-scale Linked Open Data [0.0]
The proposed method defines search entities by linking the LOD vocabulary with terms related to the target domain.
The occurrences of common upper-level entities and the chain-of-path relationships are examined to determine the range of conceptual connections in the target domain.
arXiv Detail & Related papers (2021-11-22T10:25:57Z) - How Domain Terminology Affects Meeting Summarization Performance [61.12624289478716]
We create gold-standard annotations for domain terminology on a sizable meeting corpus.
We analyze the performance of a meeting summarization system with and without jargon terms.
arXiv Detail & Related papers (2020-11-02T02:33:59Z) - Constrained Abstractive Summarization: Preserving Factual Consistency
with Constrained Generation [93.87095877617968]
We propose Constrained Abstractive Summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization.
We adopt lexically constrained decoding, a technique generally applicable to autoregressive generative models, to fulfill CAS.
We observe up to 13.8 ROUGE-2 gains when only one manual constraint is used in interactive summarization.
arXiv Detail & Related papers (2020-10-24T00:27:44Z) - SemEval-2020 Task 6: Definition extraction from free text with the DEFT
corpus [28.67911239741097]
We present DeftEval, a SemEval shared task in which participants extract definitions from free text.
DeftEval involved 3 distinct subtasks:Sentence classification, sequence labeling, and relation extraction.
arXiv Detail & Related papers (2020-08-31T15:55:24Z) - CASE: Context-Aware Semantic Expansion [68.30244980290742]
This paper defines and studies a new task called Context-Aware Semantic Expansion (CASE)
Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed.
We show that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner.
arXiv Detail & Related papers (2019-12-31T06:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.