Related papers: Evaluation Benchmarks for Spanish Sentence Representations

Evaluation Benchmarks for Spanish Sentence Representations

URL: http://arxiv.org/abs/2204.07571v1
Date: Fri, 15 Apr 2022 17:53:05 GMT
Title: Evaluation Benchmarks for Spanish Sentence Representations
Authors: Vladimir Araujo, Andr\'es Carvallo, Souvik Kundu, Jos\'e Ca\~nete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens, Alvaro Soto
Abstract summary: We introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations.
Score: 24.162683655834847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models' quality. In this paper, we narrow the gap by building two evaluation benchmarks. Inspired by previous work (Conneau and Kiela, 2018; Chen et al., 2019), we introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations, respectively. Our benchmarks include considerable pre-existing and newly constructed datasets that address different tasks from various domains. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations. As an example, we discover that for the case of discourse evaluation tasks, mBERT, a language model trained on multiple languages, usually provides a richer latent representation than models trained only with documents in Spanish. We hope our contribution will motivate a fairer, more comparable, and less cumbersome way to evaluate future Spanish language models.

Related papers

MEL: Legal Spanish Language Model [0.3651422140724638]
This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language.
arXiv Detail & Related papers (2025-01-27T12:50:10Z)
CODEOFCONDUCT at Multilingual Counterspeech Generation: A Context-Aware Model for Robust Counterspeech Generation in Low-Resource Languages [1.9263811967110864]
This paper introduces a context-aware model for robust counterspeech generation, which achieved significant success in the MCG-COLING-2025 shared task. By leveraging a simulated annealing algorithm fine-tuned on multilingual datasets, the model generates factually accurate responses to hate speech. We demonstrate state-of-the-art performance across four languages, with our system ranking first for Basque, second for Italian, and third for both English and Spanish.
arXiv Detail & Related papers (2025-01-01T03:36:31Z)
A User-Centered Evaluation of Spanish Text Simplification [6.046875672600245]
We present an evaluation of text simplification (TS) in Spanish for a production system. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. We release the corpora in our evaluation to the broader community with the hopes of pushing forward the state-of-the-art in Spanish natural language processing.
arXiv Detail & Related papers (2023-08-15T03:49:59Z)
Spanish Pre-trained BERT Model and Evaluation Data [0.0]
We present a BERT-based language model pre-trained exclusively on Spanish data. We also compiled several tasks specifically for the Spanish language in a single repository. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.
arXiv Detail & Related papers (2023-08-06T00:16:04Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z)
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z)
Lessons learned from the evaluation of Spanish Language Models [27.653133576469276]
We present a head-to-head comparison of language models for Spanish with the following results. We argue for the need of more research to understand the factors underlying them. The recent activity in the development of language technology for Spanish is to be welcomed, but our results show that building language models remains an open, resource-heavy problem.
arXiv Detail & Related papers (2022-12-16T10:33:38Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.