Assessing generalization capability of text ranking models in Polish
- URL: http://arxiv.org/abs/2402.14318v1
- Date: Thu, 22 Feb 2024 06:21:41 GMT
- Title: Assessing generalization capability of text ranking models in Polish
- Authors: S{\l}awomir Dadas, Ma{\l}gorzata Gr\k{e}bowiec
- Abstract summary: Retrieval-augmented generation (RAG) is becoming an increasingly popular technique for integrating internal knowledge bases with large language models.
In this article, we focus on the reranking problem for the Polish language, examining the performance of rerankers.
The best of our models establishes a new state-of-the-art for reranking in the Polish language, outperforming existing models with up to 30 times more parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) is becoming an increasingly popular
technique for integrating internal knowledge bases with large language models.
In a typical RAG pipeline, three models are used, responsible for the
retrieval, reranking, and generation stages. In this article, we focus on the
reranking problem for the Polish language, examining the performance of
rerankers and comparing their results with available retrieval models. We
conduct a comprehensive evaluation of existing models and those trained by us,
utilizing a benchmark of 41 diverse information retrieval tasks for the Polish
language. The results of our experiments show that most models struggle with
out-of-domain generalization. However, a combination of effective optimization
method and a large training dataset allows for building rerankers that are both
compact in size and capable of generalization. The best of our models
establishes a new state-of-the-art for reranking in the Polish language,
outperforming existing models with up to 30 times more parameters.
Related papers
- Lessons from the Trenches on Reproducible Evaluation of Language Models [60.522749986793094]
We draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers.
We present the Language Model Evaluation Harness (lm-eval), an open source library for independent, reproducible, and evaluation of language models.
arXiv Detail & Related papers (2024-05-23T16:50:49Z) - Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification [3.674863913115431]
This research showcases the innovative integration of Large Language Models into machine learning for traffic incident management.
By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification.
arXiv Detail & Related papers (2024-03-20T12:33:51Z) - PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text
Retrieval Methods [0.552480439325792]
We present Polish Information Retrieval Benchmark (PIRB), a comprehensive evaluation framework encompassing 41 text information retrieval tasks for Polish.
The benchmark incorporates existing datasets as well as 10 new, previously unpublished datasets covering diverse topics such as medicine, law, business, physics, and linguistics.
We conduct an extensive evaluation of over 20 dense and sparse retrieval models, including the baseline models trained by us.
arXiv Detail & Related papers (2024-02-20T19:53:36Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Learning Evaluation Models from Large Language Models for Sequence
Generation [44.22820310679188]
Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.
We propose textbfECT, an textbfevaluation textbfcapability textbftransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models.
Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models.
arXiv Detail & Related papers (2023-08-08T16:41:16Z) - Reimagining Retrieval Augmented Language Models for Answering Queries [23.373952699385427]
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison.
Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions.
arXiv Detail & Related papers (2023-06-01T18:08:51Z) - Enhancing Retrieval-Augmented Large Language Models with Iterative
Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge.
Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z) - SimOAP: Improve Coherence and Consistency in Persona-based Dialogue
Generation via Over-sampling and Post-evaluation [54.66399120084227]
Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue.
For the persona-based dialogue generation task, consistency and coherence are great challenges for language models.
A two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation.
arXiv Detail & Related papers (2023-05-18T17:23:00Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.