A survey of neural-network-based methods utilising comparable data for finding translation equivalents
- URL: http://arxiv.org/abs/2410.15144v1
- Date: Sat, 19 Oct 2024 16:10:41 GMT
- Title: A survey of neural-network-based methods utilising comparable data for finding translation equivalents
- Authors: Michaela Denisová, Pavel Rychlý,
- Abstract summary: We present the most common approaches from NLP that endeavour to automatically induce one of the essential dictionary components.
We analyse them from a lexicographic perspective since their viewpoints are crucial for improving the described methods.
This survey encourages a connection between the NLP and lexicography fields as the NLP field can benefit from lexicographic insights.
- Score: 0.0
- License:
- Abstract: The importance of inducing bilingual dictionary components in many natural language processing (NLP) applications is indisputable. However, the dictionary compilation process requires extensive work and combines two disciplines, NLP and lexicography, while the former often omits the latter. In this paper, we present the most common approaches from NLP that endeavour to automatically induce one of the essential dictionary components, translation equivalents and focus on the neural-network-based methods using comparable data. We analyse them from a lexicographic perspective since their viewpoints are crucial for improving the described methods. Moreover, we identify the methods that integrate these viewpoints and can be further exploited in various applications that require them. This survey encourages a connection between the NLP and lexicography fields as the NLP field can benefit from lexicographic insights, and it serves as a helping and inspiring material for further research in the context of neural-network-based methods utilising comparable data.
Related papers
- Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Surveying the Landscape of Text Summarization with Deep Learning: A
Comprehensive Review [2.4185510826808487]
Deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data.
Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data.
Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks.
arXiv Detail & Related papers (2023-10-13T21:24:37Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - FedNLP: A Research Platform for Federated Learning in Natural Language
Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP.
FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling.
Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z) - Natural Language Processing Advancements By Deep Learning: A Survey [0.755972004983746]
This survey categorizes and addresses the different aspects and applications of NLP that have benefited from deep learning.
It covers core NLP tasks and applications and describes how deep learning methods and models advance these areas.
arXiv Detail & Related papers (2020-03-02T21:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.