A Library for Automatic Natural Language Generation of Spanish Texts
- URL: http://arxiv.org/abs/2405.17280v1
- Date: Mon, 27 May 2024 15:44:06 GMT
- Title: A Library for Automatic Natural Language Generation of Spanish Texts
- Authors: Silvia García-Méndez, Milagros Fernández-Gavilanes, Enrique Costa-Montenegro, Jonathan Juncal-Martínez, F. Javier González-Castaño,
- Abstract summary: We present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words.
The system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user.
It can be easily adapted to other languages by design and can feiblyas be integrated in a wide range of digital devices.
- Score: 6.102700502396687
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this article we present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the NLG task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting NLG library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.
Related papers
- A System for Automatic English Text Expansion [10.475422682581115]
"automatic" means that the system can generate coherent and correct sentences from a minimum set of words.
For English, we have created the highly precise aLexiE lexicon with wide coverage.
System might also be applied to other domains such as report and news generation.
arXiv Detail & Related papers (2024-05-28T16:48:05Z) - Automating Knowledge Acquisition for Content-Centric Cognitive Agents
Using LLMs [0.0]
The paper describes a system that uses large language model (LLM) technology to support the automatic learning of new entries in an intelligent agent's semantic lexicon.
The process is bootstrapped by an existing non-toy lexicon and a natural language generator that converts formal, ontologically-grounded representations of meaning into natural language sentences.
arXiv Detail & Related papers (2023-12-27T02:31:51Z) - Robust Open-Set Spoken Language Identification and the CU MultiLang
Dataset [2.048226951354646]
Open-set spoken language identification systems can detect when an input exhibits none of the original languages.
We implement a novel approach to open-set spoken language identification that uses MFCC and pitch features.
We present a spoken language identification system that achieves 91.76% accuracy on trained languages and has the capability to adapt to unknown languages on the fly.
arXiv Detail & Related papers (2023-08-29T00:44:27Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - Expanding Pretrained Models to Thousands More Languages via
Lexicon-based Adaptation [133.7313847857935]
Our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology.
For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively.
arXiv Detail & Related papers (2022-03-17T16:48:22Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - Learning Adaptive Language Interfaces through Decomposition [89.21937539950966]
We introduce a neural semantic parsing system that learns new high-level abstractions through decomposition.
Users interactively teach the system by breaking down high-level utterances describing novel behavior into low-level steps.
arXiv Detail & Related papers (2020-10-11T08:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.