Vocab-Expander: A System for Creating Domain-Specific Vocabularies Based
on Word Embeddings
- URL: http://arxiv.org/abs/2308.03519v1
- Date: Mon, 7 Aug 2023 12:13:25 GMT
- Title: Vocab-Expander: A System for Creating Domain-Specific Vocabularies Based
on Word Embeddings
- Authors: Michael F\"arber, Nicholas Popovic
- Abstract summary: Vocab-Expander is an online tool that enables end-users (e.g., technology scouts) to create and expand a vocabulary of their domain of interest.
It utilizes an ensemble of state-of-the-art word embedding techniques based on web text and ConceptNet, a common-sense knowledge base.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Vocab-Expander at https://vocab-expander.com, an
online tool that enables end-users (e.g., technology scouts) to create and
expand a vocabulary of their domain of interest. It utilizes an ensemble of
state-of-the-art word embedding techniques based on web text and ConceptNet, a
common-sense knowledge base, to suggest related terms for already given terms.
The system has an easy-to-use interface that allows users to quickly confirm or
reject term suggestions. Vocab-Expander offers a variety of potential use
cases, such as improving concept-based information retrieval in technology and
innovation management, enhancing communication and collaboration within
organizations or interdisciplinary projects, and creating vocabularies for
specific courses in education.
Related papers
- LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
Descriptors [58.75140338866403]
DVDet is a Descriptor-Enhanced Open Vocabulary Detector.
It transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training.
Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-02-07T07:26:49Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - RetroGAN: A Cyclic Post-Specialization System for Improving
Out-of-Knowledge and Rare Word Representations [9.260444813514948]
RetroGAN learns a one-to-one mapping between concepts and their retrofitted counterparts.
It applies that mapping to handle concepts that do not appear in the original Knowledge Base.
We test our system on three word-similarity benchmarks and a downstream sentence simplification task.
arXiv Detail & Related papers (2021-08-30T00:34:23Z) - Enhancing Word Embeddings with Knowledge Extracted from Lexical
Resources [3.7814216736076434]
We use traditional word embeddings and apply specialization methods to better capture semantic relations between words.
In our approach, we leverage external knowledge from rich lexical resources such as BabelNet.
arXiv Detail & Related papers (2020-05-20T13:45:49Z) - Best Practices for Implementing FAIR Vocabularies and Ontologies on the
Web [0.26107298043931193]
We describe guidelines and best accessible practices for creating, understandable and reusable Semantic Web vocabularies.
We illustrate our guidelines with concrete examples, in order to help researchers implement these practices in their vocabularies.
arXiv Detail & Related papers (2020-03-29T17:40:04Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z) - Distributional semantic modeling: a revised technique to train term/word
vector space models applying the ontology-related approach [36.248702416150124]
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings)
Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
arXiv Detail & Related papers (2020-03-06T18:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.