A Broad-Coverage Deep Semantic Lexicon for Verbs
- URL: http://arxiv.org/abs/2007.02670v1
- Date: Mon, 6 Jul 2020 12:03:14 GMT
- Title: A Broad-Coverage Deep Semantic Lexicon for Verbs
- Authors: James Allen, Hannah An, Ritwik Bose, Will de Beaumont and Choh Man
Teng
- Abstract summary: COLLIE-V is a deep lexical resource for verbs with the coverage of WordNet and semantic details that meet or exceed existing resources.
New ontological concepts and lexical entries, together with semantic role preferences and entailment axioms, are automatically derived.
- Score: 3.219005794369446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Progress on deep language understanding is inhibited by the lack of a broad
coverage lexicon that connects linguistic behavior to ontological concepts and
axioms. We have developed COLLIE-V, a deep lexical resource for verbs, with the
coverage of WordNet and syntactic and semantic details that meet or exceed
existing resources. Bootstrapping from a hand-built lexicon and ontology, new
ontological concepts and lexical entries, together with semantic role
preferences and entailment axioms, are automatically derived by combining
multiple constraints from parsing dictionary definitions and examples. We
evaluated the accuracy of the technique along a number of different dimensions
and were able to obtain high accuracy in deriving new concepts and lexical
entries. COLLIE-V is publicly available.
Related papers
- Partial Colexifications Improve Concept Embeddings [1.3351610617039973]
We show how partial colexifications can be used to improve concept embeddings in meaningful ways.
The learned embeddings are evaluated against lexical similarity ratings, recorded instances of semantic shift, and word association data.
arXiv Detail & Related papers (2025-02-13T19:58:00Z) - A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation [0.0]
This paper explores techniques that focus on understanding and resolving ambiguity in language within the field of natural language processing (NLP)
It outlines diverse approaches ranging from deep learning techniques to leveraging lexical resources and knowledge graphs like WordNet.
The research identifies persistent challenges in the field, such as the scarcity of sense annotated corpora and the complexity of informal clinical texts.
arXiv Detail & Related papers (2024-03-24T12:58:48Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Multi-Relational Hyperbolic Word Embeddings from Natural Language
Definitions [5.763375492057694]
This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions.
An empirical analysis demonstrates that the framework can help imposing the desired structural constraints.
Experiments reveal the superiority of the Hyperbolic word embeddings over the Euclidean counterparts.
arXiv Detail & Related papers (2023-05-12T08:16:06Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - SensePOLAR: Word sense aware interpretability for pre-trained contextual
word embeddings [4.479834103607384]
Adding interpretability to word embeddings represents an area of active research in text representation.
We present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings.
arXiv Detail & Related papers (2023-01-11T20:25:53Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Enhanced word embeddings using multi-semantic representation through
lexical chains [1.8199326045904998]
We propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II.
These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system.
Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
arXiv Detail & Related papers (2021-01-22T09:43:33Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.