A System for Automatic English Text Expansion
- URL: http://arxiv.org/abs/2405.18350v1
- Date: Tue, 28 May 2024 16:48:05 GMT
- Title: A System for Automatic English Text Expansion
- Authors: Silvia García Méndez, Milagros Fernández Gavilanes, Enrique Costa Montenegro, Jonathan Juncal Martínez, Francisco Javier González Castaño, Ehud Reiter,
- Abstract summary: "automatic" means that the system can generate coherent and correct sentences from a minimum set of words.
For English, we have created the highly precise aLexiE lexicon with wide coverage.
System might also be applied to other domains such as report and news generation.
- Score: 10.475422682581115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.
Related papers
- Predictability and Causality in Spanish and English Natural Language Generation [6.817247544942709]
This paper compares causal and non-causal language modeling for English and Spanish.
According to this experiment, Spanish is more predictable than English given a non-causal context.
These insights support further research in NLG in Spanish using bidirectional transformer language models.
arXiv Detail & Related papers (2024-08-26T14:09:28Z) - A Library for Automatic Natural Language Generation of Spanish Texts [6.102700502396687]
We present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words.
The system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user.
It can be easily adapted to other languages by design and can feiblyas be integrated in a wide range of digital devices.
arXiv Detail & Related papers (2024-05-27T15:44:06Z) - TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer [3.9981390090442694]
We present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer.
We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English.
Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems.
arXiv Detail & Related papers (2024-05-03T14:25:21Z) - Spanish Resource Grammar version 2023 [12.009437358109407]
We present the latest version of the Spanish Resource Grammar (SRG)
Such grammars encode a complex set of hypotheses about syntax making them a resource for empirical testing of linguistic theory.
This version of the SRG uses the recent version of the Freeling morphological and is released along with an automatically created, manually verified treebank of 2,291 sentences.
arXiv Detail & Related papers (2023-09-23T09:24:05Z) - BiSync: A Bilingual Editor for Synchronized Monolingual Texts [2.0411082897313984]
We present BiSync, a bilingual writing assistant that allows users to freely compose text in two languages.
We detail the model architecture used for synchronization and evaluate the resulting tool, showing that high accuracy can be attained with limited computational resources.
arXiv Detail & Related papers (2023-06-01T07:03:47Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - "Listen, Understand and Translate": Triple Supervision Decouples
End-to-end Speech-to-text Translation [49.610188741500274]
An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language.
Existing methods are limited by the amount of parallel corpus.
We build a system to fully utilize signals in a parallel ST corpus.
arXiv Detail & Related papers (2020-09-21T09:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.