Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian
- URL: http://arxiv.org/abs/2407.20654v1
- Date: Tue, 30 Jul 2024 08:50:16 GMT
- Title: Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian
- Authors: Serena Auriemma, Martina Miliani, Mauro Madeddu, Alessandro Bondielli, Lucia Passaro, Alessandro Lenci,
- Abstract summary: This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts.
Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models.
The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
- Score: 75.94354349994576
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Addressing the challenge of limited annotated data in specialized fields and low-resource languages is crucial for the effective use of Language Models (LMs). While most Large Language Models (LLMs) are trained on general-purpose English corpora, there is a notable gap in models specifically tailored for Italian, particularly for technical and bureaucratic jargon. This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in these specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. We evaluated the models on downstream tasks such as document classification and entity typing and conducted intrinsic evaluations using Pseudo-Log-Likelihood. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting. Furthermore, the application of calibration techniques and in-domain verbalizers significantly enhances the efficacy of encoder models. These domain-specialized models prove to be particularly advantageous in scenarios where in-domain resources or expertise are scarce. In conclusion, our findings offer new insights into the use of Italian models in specialized contexts, which may have a significant impact on both research and industrial applications in the digital transformation era.
Related papers
- Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry [5.4665365335928024]
We investigate the trade-offs of leveraging off-the-shelf versus more targeted foundation models for scientific domains.
In this work, we examine the benefits of in-domain pre-training for a given scientific domain, chemistry, and compare these to open-source, off-the-shelf models with zero-shot and few-shot prompting.
Our results show that not only do in-domain base models perform reasonably well on in-domain tasks in a zero-shot setting but that further adaptation using instruction fine-tuning yields impressive performance on chemistry-specific tasks.
arXiv Detail & Related papers (2024-11-05T22:45:10Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - DSG-KD: Knowledge Distillation from Domain-Specific to General Language Models [8.328673243329794]
This study investigates emergency/non-emergency classification tasks based on electronic medical record (EMR) data obtained from pediatric emergency departments (PEDs) in Korea.
Existing domain-specific pre-trained language models underperform compared to general language models in handling N-lingual free-text data characteristics.
We propose a domain knowledge transfer methodology that leverages knowledge distillation to infuse general language models with domain-specific knowledge via fine-tuning.
arXiv Detail & Related papers (2024-09-23T10:59:02Z) - Neural Machine Translation Models Can Learn to be Few-shot Learners [2.2999148299770042]
We show that a much smaller model can be trained to perform in-context learning (ICL)
With this capacity for ICL, the model can take advantage of relevant few-shot examples to adapt its output towards the domain.
Our approach allows efficient batch inference on a mix of domains and outperforms state-of-the-art baselines in terms of both translation quality and immediate adaptation rate.
arXiv Detail & Related papers (2023-09-15T17:44:21Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z) - Are Large Language Models Robust Coreference Resolvers? [17.60248310475889]
We show that prompting for coreference can outperform current unsupervised coreference systems.
Further investigations reveal that instruction-tuned LMs generalize surprisingly well across domains, languages, and time periods.
arXiv Detail & Related papers (2023-05-23T19:38:28Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.