Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
- URL: http://arxiv.org/abs/2305.14592v2
- Date: Thu, 6 Jun 2024 03:20:45 GMT
- Title: Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
- Authors: Ruohao Guo, Wei Xu, Alan Ritter,
- Abstract summary: We show that current large language models struggle to capture some language styles without fine-tuning.
We investigate whether LLMs can be meta-trained based on representative lexicons to recognize new styles they have not been fine-tuned on.
- Score: 24.355564722047244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language style is often used by writers to convey their intentions, identities, and mastery of language. In this paper, we show that current large language models struggle to capture some language styles without fine-tuning. To address this challenge, we investigate whether LLMs can be meta-trained based on representative lexicons to recognize new styles they have not been fine-tuned on. Experiments on 13 established style classification tasks, as well as 63 novel tasks generated using LLMs, demonstrate that meta-training with style lexicons consistently improves zero-shot transfer across styles. We release the code and data at http://github.com/octaviaguo/Style-LLM .
Related papers
- Codebook LLMs: Adapting Political Science Codebooks for LLM Use and Adapting LLMs to Follow Codebooks [7.005758904228446]
We argue that political scientists who care about valid measurement should instead make a codebook-construct label assumption.
We conduct a set of experiments to understand whether LLMs comply with codebook instructions.
We find re-structuring the original codebooks gives modest gains in zero-shot performance.
arXiv Detail & Related papers (2024-07-15T14:20:09Z) - Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts [50.40191599304911]
We investigate whether transliteration is also effective in improving LLMs' performance for low-resource languages written in non-Latin scripts.
We propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both.
Our findings show that the effectiveness of transliteration varies by task type and model size.
arXiv Detail & Related papers (2024-07-02T14:51:20Z) - Learning to Prompt with Text Only Supervision for Vision-Language Models [107.282881515667]
One branch of methods adapts CLIP by learning prompts using visual information.
An alternative approach resorts to training-free methods by generating class descriptions from large language models.
We propose to combine the strengths of both streams by learning prompts using only text data.
arXiv Detail & Related papers (2024-01-04T18:59:49Z) - ICL Markup: Structuring In-Context Learning using Soft-Token Tags [8.211752085441923]
Large pretrained language models (LLMs) can be rapidly adapted to a wide variety of tasks via a text-to-text approach.
Inspired by markup languages like HTML, we contribute a method of using soft-token tags to compose prompt templates.
Our method is a form of meta-learning for ICL; it learns these tags in advance during a parameter-efficient fine-tuning warm-up'' process.
arXiv Detail & Related papers (2023-12-12T16:25:05Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - InstructAlign: High-and-Low Resource Language Alignment via Continual
Crosslingual Instruction Tuning [66.31509106146605]
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages.
However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data.
We propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages.
arXiv Detail & Related papers (2023-05-23T02:51:34Z) - Word Embeddings Are Steers for Language Models [57.83026781380927]
We name such steers LM-Steers and find them existing in LMs of all sizes.
On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance.
An LM-Steer is transferrable between different language models by an explicit form calculation.
arXiv Detail & Related papers (2023-05-22T07:52:04Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.