LitMind Dictionary: An Open-Source Online Dictionary
- URL: http://arxiv.org/abs/2204.11087v1
- Date: Sat, 23 Apr 2022 15:10:40 GMT
- Title: LitMind Dictionary: An Open-Source Online Dictionary
- Authors: Cunliang Kong, Xuezhi Fang, Liner Yang, Yun Chen, Erhong Yang
- Abstract summary: We introduce the LitMind Dictionary, an open-source online generative dictionary.
It takes a word and context containing the word as input and automatically generates a definition as output.
It supports not only Chinese and English, but also Chinese-English cross-lingual queries.
- Score: 5.2221935174520056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dictionaries can help language learners to learn vocabulary by providing
definitions of words. Since traditional dictionaries present word senses as
discrete items in predefined inventories, they fall short of flexibility, which
is required in providing specific meanings of words in particular contexts. In
this paper, we introduce the LitMind Dictionary
(https://dictionary.litmind.ink), an open-source online generative dictionary
that takes a word and context containing the word as input and automatically
generates a definition as output. Incorporating state-of-the-art definition
generation models, it supports not only Chinese and English, but also
Chinese-English cross-lingual queries. Moreover, it has a user-friendly
front-end design that can help users understand the query words quickly and
easily. All the code and data are available at
https://github.com/blcuicall/litmind-dictionary.
Related papers
- Learning Interpretable Queries for Explainable Image Classification with
Information Pursuit [18.089603786027503]
Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data.
This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset.
arXiv Detail & Related papers (2023-12-16T21:43:07Z) - Dict-BERT: Enhancing Language Model Pre-training with Dictionary [42.0998323292348]
Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora.
In this work, we focus on enhancing language model pre-training by leveraging definitions of rare words in dictionaries.
We propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions.
arXiv Detail & Related papers (2021-10-13T04:29:14Z) - Lacking the embedding of a word? Look it up into a traditional
dictionary [0.2624902795082451]
We propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words.
DefiNNet and DefBERT significantly outperform state-of-the-art as well as baseline methods for producing embeddings of unknown words.
arXiv Detail & Related papers (2021-09-24T06:27:58Z) - Allocating Large Vocabulary Capacity for Cross-lingual Language Model
Pre-training [59.571632468137075]
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
We propose an algorithm VoCap to determine the desired vocabulary capacity of each language.
In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax.
arXiv Detail & Related papers (2021-09-15T14:04:16Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - Toward Cross-Lingual Definition Generation for Language Learners [10.45755551957024]
We propose to generate definitions in English for words in various languages.
Models can be directly applied to other languages after trained on the English dataset.
Experiments and manual analyses show that our models have a strong cross-lingual transfer ability.
arXiv Detail & Related papers (2020-10-12T08:45:28Z) - BERT for Monolingual and Cross-Lingual Reverse Dictionary [56.8627517256663]
We propose a simple but effective method to make BERT generate the target word for this specific task.
By using the BERT (mBERT), we can efficiently conduct the cross-lingual reverse dictionary with one subword embedding.
arXiv Detail & Related papers (2020-09-30T17:00:10Z) - On the Learnability of Concepts: With Applications to Comparing Word
Embedding Algorithms [0.0]
We introduce the notion of "concept" as a list of words that have shared semantic content.
We first use this notion to measure the learnability of concepts on pretrained word embeddings.
We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms.
arXiv Detail & Related papers (2020-06-17T14:25:36Z) - When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning
and Coding Network for Image Recognition with Limited Data [74.75557280245643]
We present a new Deep Dictionary Learning and Coding Network (DDLCN) for image recognition tasks with limited data.
We empirically compare DDLCN with several leading dictionary learning methods and deep learning models.
Experimental results on five popular datasets show that DDLCN achieves competitive results compared with state-of-the-art methods when the training data is limited.
arXiv Detail & Related papers (2020-05-21T23:12:10Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.