Zur Darstellung eines mehrstufigen Prototypbegriffs in der
multilingualen automatischen Sprachgenerierung: vom Korpus \"uber word
embeddings bis hin zum automatischen W\"orterbuch
- URL: http://arxiv.org/abs/2312.16311v1
- Date: Tue, 26 Dec 2023 19:39:25 GMT
- Title: Zur Darstellung eines mehrstufigen Prototypbegriffs in der
multilingualen automatischen Sprachgenerierung: vom Korpus \"uber word
embeddings bis hin zum automatischen W\"orterbuch
- Authors: Mar\'ia Jos\'e Dom\'inguez V\'azquez
- Abstract summary: The multilingual dictionary of noun valency Portlex is considered to be the trigger for the creation of the automatic language generators Xera and Combinatoria.
Both prototypes are used for the automatic generation of nominal phrases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multilingual dictionary of noun valency Portlex is considered to be the
trigger for the creation of the automatic language generators Xera and
Combinatoria, whose development and use is presented in this paper. Both
prototypes are used for the automatic generation of nominal phrases with their
mono- and bi-argumental valence slots, which could be used, among others, as
dictionary examples or as integrated components of future autonomous
E-Learning-Tools. As samples for new types of automatic valency dictionaries
including user interaction, we consider the language generators as we know them
today. In the specific methodological procedure for the development of the
language generators, the syntactic-semantic description of the noun slots turns
out to be the main focus from a syntagmatic and paradigmatic point of view.
Along with factors such as representativeness, grammatical correctness,
semantic coherence, frequency and the variety of lexical candidates, as well as
semantic classes and argument structures, which are fixed components of both
resources, a concept of a multi-sided prototype stands out. The combined
application of this prototype concept as well as of word embeddings together
with techniques from the field of automatic natural language processing and
generation (NLP and NLG) opens up a new way for the future development of
automatically generated plurilingual valency dictionaries. All things
considered, the paper depicts the language generators both from the point of
view of their development as well as from that of the users. The focus lies on
the role of the prototype concept within the development of the resources.
Related papers
- Contribuci\'on de la sem\'antica combinatoria al desarrollo de
herramientas digitales multiling\"ues [0.0]
This paper describes how the field of Combinatorial Semantics has contributed to the design of three prototypes for the automatic generation of argument patterns in nominal phrases in Spanish, French and German.
It also shows the importance of knowing about the argument syntactic-semantic interface in a production situation in the context of foreign languages.
arXiv Detail & Related papers (2023-12-26T19:32:05Z) - Generative Spoken Language Model based on continuous word-sized audio
tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings.
The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z) - Assisting Language Learners: Automated Trans-Lingual Definition
Generation via Contrastive Prompt Learning [25.851611353632926]
The standard definition generation task requires to automatically produce mono-lingual definitions.
We propose a novel task of Trans-Lingual Definition Generation (TLDG), which aims to generate definitions in another language.
arXiv Detail & Related papers (2023-06-09T17:32:45Z) - Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z) - Tokenization Impacts Multilingual Language Modeling: Assessing
Vocabulary Allocation and Overlap Across Languages [3.716965622352967]
We propose new criteria to evaluate the quality of lexical representation and vocabulary overlap observed in sub-word tokenizers.
Our findings show that the overlap of vocabulary across languages can be actually detrimental to certain downstream tasks.
arXiv Detail & Related papers (2023-05-26T18:06:49Z) - Multilingual Generative Language Models for Zero-Shot Cross-Lingual
Event Argument Extraction [80.61458287741131]
We present a study on leveraging multilingual pre-trained generative language models for zero-shot cross-lingual event argument extraction (EAE)
By formulating EAE as a language generation task, our method effectively encodes event structures and captures the dependencies between arguments.
Our proposed model finetunes multilingual pre-trained generative language models to generate sentences that fill in the language-agnostic template with arguments extracted from the input passage.
arXiv Detail & Related papers (2022-03-15T23:00:32Z) - Generalising Multilingual Concept-to-Text NLG with Language Agnostic
Delexicalisation [0.40611352512781856]
Concept-to-text Natural Language Generation is the task of expressing an input meaning representation in natural language.
We propose Language Agnostic Delexicalisation, a novel delexicalisation method that uses multilingual pretrained embeddings.
Our experiments across five datasets and five languages show that multilingual models outperform monolingual models in concept-to-text.
arXiv Detail & Related papers (2021-05-07T17:48:53Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - Generative latent neural models for automatic word alignment [0.0]
Variational autoencoders have been recently used in various of natural language processing to learn in an unsupervised way latent representations that are useful for language generation tasks.
In this paper, we study these models for the task of word alignment and propose and assess several evolutions of a vanilla variational autoencoders.
We demonstrate that these techniques can yield competitive results as compared to Giza++ and to a strong neural network alignment system for two language pairs.
arXiv Detail & Related papers (2020-09-28T07:54:09Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.