Related papers: On the Usability of Transformers-based models for a French Question-Answering task

On the Usability of Transformers-based models for a French Question-Answering task

URL: http://arxiv.org/abs/2207.09150v1
Date: Tue, 19 Jul 2022 09:46:15 GMT
Title: On the Usability of Transformers-based models for a French Question-Answering task
Authors: Oralie Cattan, Christophe Servan and Sophie Rosset
Abstract summary: This paper focuses on the usability of Transformer-based language models in small-scale learning problems. We introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
Score: 2.44288434255221
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For many tasks, state-of-the-art results have been achieved with Transformer-based architectures, resulting in a paradigmatic shift in practices from the use of task-specific architectures to the fine-tuning of pre-trained language models. The ongoing trend consists in training models with an ever-increasing amount of data and parameters, which requires considerable resources. It leads to a strong search to improve resource efficiency based on algorithmic and hardware improvements evaluated only for English. This raises questions about their usability when applied to small-scale learning problems, for which a limited amount of training data is available, especially for under-resourced languages tasks. The lack of appropriately sized corpora is a hindrance to applying data-driven and transfer learning-based approaches with strong instability cases. In this paper, we establish a state-of-the-art of the efforts dedicated to the usability of Transformer-based models and propose to evaluate these improvements on the question-answering performances of French language which have few resources. We address the instability relating to data scarcity by investigating various training strategies with data augmentation, hyperparameters optimization and cross-lingual transfer. We also introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.

Related papers

Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach [1.3286097954612326]
This paper addresses the limited transfer and adaptation capabilities of large language models in low-resource language scenarios.<n>It proposes a unified framework that combines a knowledge transfer module with parameter-efficient fine-tuning strategies.<n>It enhances task-specific adaptability while preserving the general capabilities of large language models.
arXiv Detail & Related papers (2025-07-01T09:34:49Z)
Improving Multilingual Math Reasoning for African Languages [49.27985213689457]
We conduct experiments to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations.<n>Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.
arXiv Detail & Related papers (2025-05-26T11:35:01Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
High-Resource Translation:Turning Abundance into Accessibility [0.0]
This paper presents a novel approach to constructing an English-to-Telugu translation model by leveraging transfer learning techniques. The model incorporates iterative backtranslation to generate synthetic parallel data, effectively augmenting the training dataset and enhancing the model's translation capabilities.
arXiv Detail & Related papers (2025-04-08T11:09:51Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
Unsupervised Data Validation Methods for Efficient Model Training [0.0]
State-of-the-art models in natural language processing (NLP), text-to-speech (TTS), speech-to-text (STT) and vision-language models (VLM) rely heavily on large datasets. This research explores key areas such as defining "quality data," developing methods for generating appropriate data and enhancing accessibility to model training.
arXiv Detail & Related papers (2024-10-10T13:00:53Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer. In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z)
WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks [4.102007186133394]
In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data.
arXiv Detail & Related papers (2023-03-05T19:45:42Z)
Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context [0.0]
This tutorial demonstrates to incorporate text data into actuarial classification and regression tasks. The main focus is on methods employing transformer-based models. The case studies tackle challenges related to a multi-lingual setting and long input sequences.
arXiv Detail & Related papers (2022-06-04T15:39:30Z)
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model. Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model. We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.