Related papers: LGDE: Local Graph-based Dictionary Expansion

LGDE: Local Graph-based Dictionary Expansion

URL: http://arxiv.org/abs/2405.07764v2
Date: Thu, 18 Jul 2024 06:11:41 GMT
Title: LGDE: Local Graph-based Dictionary Expansion
Authors: Dominik J. Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft, Mauricio Barahona,
Abstract summary: Local Graph-based Dictionary Expansion (LGDE) is a method for data-driven discovery of the semantic neighbourhood of words. We show that LGDE enriches the list of keywords with significantly better performance than threshold methods based on direct word similarities. Our empirical results and expert user assessment indicate that LGDE expands the seed dictionary with more useful keywords due to the manifold-learning-based similarity network.
Score: 0.923607423080658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Local Graph-based Dictionary Expansion (LGDE), a method for data-driven discovery of the semantic neighbourhood of words using tools from manifold learning and network science. At the heart of LGDE lies the creation of a word similarity graph from the geometry of word embeddings followed by local community detection based on graph diffusion. The diffusion in the local graph manifold allows the exploration of the complex nonlinear geometry of word embeddings to capture word similarities based on paths of semantic association, over and above direct pairwise similarities. Exploiting such semantic neighbourhoods enables the expansion of dictionaries of pre-selected keywords, an important step for tasks in information retrieval, such as database queries and online data collection. We validate LGDE on a corpus of English-language hate speech-related posts from Reddit and Gab and show that LGDE enriches the list of keywords with significantly better performance than threshold methods based on direct word similarities. We further demonstrate our method through a real-world use case from communication science, where LGDE is evaluated quantitatively on the expansion of a conspiracy-related dictionary from online data collected and analysed by domain experts. Our empirical results and expert user assessment indicate that LGDE expands the seed dictionary with more useful keywords due to the manifold-learning-based similarity network.

Related papers

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval [0.5872014229110214]
Vector Space Modelling (VSM) and neural word embeddings play a crucial role in modern machine learning and Natural Language Processing pipelines.<n>This paper evaluates an alternative approach to measuring query statement similarity that moves away from the common similarity measure of centroids of neural word embeddings.
arXiv Detail & Related papers (2026-02-05T14:57:38Z)
GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z)
Enriching Word Usage Graphs with Cluster Definitions [5.3135532294740475]
We present a dataset of word usage graphs (WUGs) where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet.
arXiv Detail & Related papers (2024-03-26T18:22:05Z)
Contextual Dictionary Lookup for Knowledge Graph Completion [32.493168863565465]
Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples. Most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. We present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner.
arXiv Detail & Related papers (2023-06-13T12:13:41Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary. By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning. The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z)
Graph Adaptive Semantic Transfer for Cross-domain Sentiment Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z)
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation [87.9994254822078]
For face presentation attack (PAD), most of the spoofing cues are subtle, local image patterns. VLAD aggregation method is adopted to quantize local features with visual vocabulary locally partitioning the feature space. Proposed vocabulary separation method divides vocabulary into domain-shared and domain-specific visual words.
arXiv Detail & Related papers (2022-02-21T15:27:41Z)
Keyphrase Extraction Using Neighborhood Knowledge Based on Word Embeddings [17.198907789163123]
We enhance the graph-based ranking model by leveraging word embeddings as background knowledge to add semantic information to the inter-word graph. Our approach is evaluated on established benchmark datasets and empirical results show that the word embedding neighborhood information improves the model performance.
arXiv Detail & Related papers (2021-11-13T21:48:18Z)
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings. We derive new distributional semantic similarity measures for M-SE from prior ones. We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z)
Unsupervised Key-phrase Extraction and Clustering for Classification Scheme in Scientific Publications [0.0]
We investigate possible ways of automating parts of the Systematic Mapping (SM) and Systematic Review (SR) process. Key-phrases are extracted from scientific documents using unsupervised methods, which are then used to construct the corresponding Classification Scheme. We also explore how clustering can be used to group related key-phrases.
arXiv Detail & Related papers (2021-01-25T10:17:33Z)
Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole. We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
Keywords lie far from the mean of all words in local vector space [5.040463208115642]
In this work, we follow a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations. We confirm the high performance of our approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods.
arXiv Detail & Related papers (2020-08-21T14:42:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.