A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph
- URL: http://arxiv.org/abs/2101.02875v1
- Date: Fri, 8 Jan 2021 06:47:32 GMT
- Title: A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph
- Authors: Mohannad AlMousa, Rachid Benlamri, Richard Khoury
- Abstract summary: This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM)
The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context.
The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various applications in computational linguistics and artificial intelligence
rely on high-performing word sense disambiguation techniques to solve
challenging tasks such as information retrieval, machine translation, question
answering, and document clustering. While text comprehension is intuitive for
humans, machines face tremendous challenges in processing and interpreting a
human's natural language. This paper presents a novel knowledge-based word
sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix
Multiplication (SCSMM). The SCSMM algorithm combines semantic similarity,
heuristic knowledge, and document context to respectively exploit the merits of
local context between consecutive terms, human knowledge about terms, and a
document's main topic in disambiguating terms. Unlike other algorithms, the
SCSMM algorithm guarantees the capture of the maximum sentence context while
maintaining the terms' order within the sentence. The proposed algorithm
outperformed all other algorithms when disambiguating nouns on the combined
gold standard datasets, while demonstrating comparable results to current
state-of-the-art word sense disambiguation systems when dealing with each
dataset separately. Furthermore, the paper discusses the impact of granularity
level, ambiguity rate, sentence size, and part of speech distribution on the
performance of the proposed algorithm.
Related papers
- H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables [56.73919743039263]
This paper introduces a novel algorithm that integrates both symbolic and semantic (textual) approaches in a two-stage process to address limitations.
Our experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three question-answering (QA) and fact-verification datasets.
arXiv Detail & Related papers (2024-06-29T21:24:19Z) - Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression [15.460141768587663]
We propose a lightweight supervised dictionary learning framework for text classification based on data compression and representation.
We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance.
arXiv Detail & Related papers (2024-04-28T10:11:52Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - DEIM: An effective deep encoding and interaction model for sentence
matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information.
In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information.
In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Using the Full-text Content of Academic Articles to Identify and
Evaluate Algorithm Entities in the Domain of Natural Language Processing [7.163189900803623]
This article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field.
A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching.
The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm.
arXiv Detail & Related papers (2020-10-21T08:24:18Z) - Research on Annotation Rules and Recognition Algorithm Based on Phrase
Window [4.334276223622026]
We propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms.
The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases.
The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence.
arXiv Detail & Related papers (2020-07-07T00:19:47Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.