The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
- URL: http://arxiv.org/abs/2406.02396v1
- Date: Tue, 4 Jun 2024 15:11:27 GMT
- Title: The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
- Authors: Kenneth Enevoldsen, Márton Kardos, Niklas Muennighoff, Kristoffer Laigaard Nielbo,
- Abstract summary: Scandinavian Embedding Benchmark (SEB) is a framework that enables text embedding evaluation for Scandinavian languages.
Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions.
We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
- Score: 8.097049661773465
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
Related papers
- BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings [0.4194295877935868]
The choice of embeddings plays a critical role in enhancing the performance of NLP tasks.
In this study, we investigate the impact of various embedding techniques- Contextual BERT-based, Non-Contextual BERT-based, and FastText-based on NLP classification tasks specific to the Marathi language.
arXiv Detail & Related papers (2024-11-26T18:25:57Z) - ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection for ABSA [50.90538760832107]
This research presents a novel task, Review-Level Opinion Aspect Sentiment Target (ROAST)
ROAST seeks to close the gap between sentence-level and text-level ABSA by identifying every ABSA constituent at the review level.
We extend the available datasets to enable ROAST, addressing the drawbacks noted in previous research.
arXiv Detail & Related papers (2024-05-30T17:29:15Z) - PL-MTEB: Polish Massive Text Embedding Benchmark [0.0]
Polish Massive Text Embedding Benchmark (PL-MTEB) is a benchmark for text embeddings in Polish.
PL-MTEB consists of 28 diverse NLP tasks from 5 task types.
arXiv Detail & Related papers (2024-05-16T14:33:39Z) - SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation [52.186343500576214]
We introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality.
We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE and mFACE.
arXiv Detail & Related papers (2023-05-22T16:25:07Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Are the Best Multilingual Document Embeddings simply Based on Sentence
Embeddings? [18.968571816913208]
We provide a systematic comparison of methods to produce document-level representations from sentences based on LASER, LaBSE, and Sentence BERT pre-trained multilingual models.
We show that a clever combination of sentence embeddings is usually better than encoding the full document as a single unit.
arXiv Detail & Related papers (2023-04-28T12:11:21Z) - MTEB: Massive Text Embedding Benchmark [6.023518635799927]
It is unclear whether state-of-the-art embeddings on semantic textual similarity can be equally well applied to other tasks like clustering or reranking.
Massive Text Embedding Benchmark (MTEB) spans 8 embedding tasks covering a total of 58 datasets and 112 languages.
arXiv Detail & Related papers (2022-10-13T19:42:08Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.