Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy
in Mental Health and Beyond
- URL: http://arxiv.org/abs/2310.05317v5
- Date: Mon, 13 Nov 2023 14:35:22 GMT
- Title: Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy
in Mental Health and Beyond
- Authors: Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada
Mihalcea
- Abstract summary: We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task.
We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol.
We find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens.
- Score: 66.07002187192448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose task-adaptive tokenization as a way to adapt the generation
pipeline to the specifics of a downstream task and enhance long-form generation
in mental health. Inspired by insights from cognitive science, our
task-adaptive tokenizer samples variable segmentations from multiple outcomes,
with sampling probabilities optimized based on task-specific data. We introduce
a strategy for building a specialized vocabulary and introduce a vocabulary
merging protocol that allows for the integration of task-specific tokens into
the pre-trained model's tokenization step. Through extensive experiments on
psychological question-answering tasks in both Chinese and English, we find
that our task-adaptive tokenization approach brings a significant improvement
in generation performance while using up to 60% fewer tokens. Preliminary
experiments point to promising results when using our tokenization approach
with very large language models.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Adaptive Gating in Mixture-of-Experts based Language Models [7.936874532105228]
Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models.
This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts.
arXiv Detail & Related papers (2023-10-11T04:30:18Z) - A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task
Learning [8.052271364177988]
Subword tokenization is a commonly used input pre-processing step in most recent NLP models.
We propose a vocabulary-free neural tokenizer by distilling segmentation information from subword tokenization.
Our tokenizer consistently improves performance on multilingual (NLI) and code-switching (sentiment analysis) tasks.
arXiv Detail & Related papers (2022-04-22T16:50:49Z) - On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies.
We measure changes in attributes of generated text as a function of both decoding strategy and task.
Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Pretext Tasks selection for multitask self-supervised speech
representation learning [23.39079406674442]
This paper introduces a method to select a group of pretext tasks among a set of candidates.
Experiments conducted on speaker recognition and automatic speech recognition validate our approach.
arXiv Detail & Related papers (2021-07-01T16:36:29Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.