An Analysis of Word2Vec for the Italian Language
- URL: http://arxiv.org/abs/2001.09332v1
- Date: Sat, 25 Jan 2020 15:12:01 GMT
- Title: An Analysis of Word2Vec for the Italian Language
- Authors: Giovanni Di Gennaro, Amedeo Buonanno, Antonio Di Girolamo, Armando
Ospedale, Francesco A.N. Palmieri, Gianfranco Fedele
- Abstract summary: Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text.
In this work, analysing the semantic capacity of the Word2Vec algorithm, an embedding for the Italian language is produced.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word representation is fundamental in NLP tasks, because it is precisely from
the coding of semantic closeness between words that it is possible to think of
teaching a machine to understand text. Despite the spread of word embedding
concepts, still few are the achievements in linguistic contexts other than
English. In this work, analysing the semantic capacity of the Word2Vec
algorithm, an embedding for the Italian language is produced. Parameter setting
such as the number of epochs, the size of the context window and the number of
negatively backpropagated samples is explored.
Related papers
- SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Semeval-2022 Task 1: CODWOE -- Comparing Dictionaries and Word
Embeddings [1.5293427903448025]
We focus on relating opaque word vectors with human-readable definitions.
This problem naturally divides into two subtasks: converting definitions into embeddings, and converting embeddings into definitions.
This task was conducted in a multilingual setting, using comparable sets of embeddings trained homogeneously.
arXiv Detail & Related papers (2022-05-27T09:40:33Z) - A Survey On Neural Word Embeddings [0.4822598110892847]
The study of meaning in natural language processing relies on the distributional hypothesis.
The revolutionary idea of distributed representation for a concept is close to the working of a human mind.
Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks.
arXiv Detail & Related papers (2021-10-05T03:37:57Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - Language-Independent Tokenisation Rivals Language-Specific Tokenisation
for Word Similarity Prediction [12.376752724719005]
Language-independent tokenisation (LIT) methods do not require labelled language resources or lexicons.
Language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources.
We empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages.
arXiv Detail & Related papers (2020-02-25T16:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.