Free Lunch for Efficient Textual Commonsense Integration in Language
Models
- URL: http://arxiv.org/abs/2305.15516v1
- Date: Wed, 24 May 2023 19:14:57 GMT
- Title: Free Lunch for Efficient Textual Commonsense Integration in Language
Models
- Authors: Wanyun Cui, Xingran Chen
- Abstract summary: We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples.
Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance.
The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.
- Score: 20.02647320786556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the emergence of textual commonsense knowledge
bases, aimed at providing more nuanced and context-rich knowledge. The
integration of external commonsense into language models has been shown to be a
key enabler in advancing the state-of-the-art for a wide range of NLP tasks.
However, incorporating textual commonsense descriptions is computationally
expensive, as compared to encoding conventional symbolic knowledge. In this
paper, we propose a method to improve its efficiency without modifying the
model. We group training samples with similar commonsense descriptions into a
single batch, thus reusing the encoded description across multiple samples. One
key observation is that the upper bound of batch partitioning can be reduced to
the classic {\it graph k-cut problem}. Consequently, we propose a spectral
clustering-based algorithm to solve this problem. Extensive experiments
illustrate that the proposed batch partitioning approach effectively reduces
the computational cost while preserving performance. The efficiency improvement
is more pronounced on larger datasets and on devices with more memory capacity,
attesting to its practical utility for large-scale applications.
Related papers
- A Theoretical Perspective for Speculative Decoding Algorithm [60.79447486066416]
One effective way to accelerate inference is emphSpeculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate.
This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, emphoutput quality and inference acceleration, from a theoretical perspective.
arXiv Detail & Related papers (2024-10-30T01:53:04Z) - Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning [19.16587730306472]
In-Context Learning (ICL) emerges as a key feature for Large Language Models (LLMs)
We propose Logit Arithmetic Reweighting Approach (LARA), a novel framework that enhances ICL by using logit-based ensembling of multiple demonstrations.
arXiv Detail & Related papers (2024-10-14T01:34:16Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot
Filling [2.6056468338837457]
Slot filling poses a critical challenge to handle a novel domain whose samples are never seen during training.
Most prior works deal with this problem in a two-pass pipeline manner based on metric learning.
We propose a new adaptive end-to-end metric learning scheme for the challenging zero-shot slot filling.
arXiv Detail & Related papers (2023-10-23T19:01:16Z) - Filling in the Gaps: Efficient Event Coreference Resolution using Graph
Autoencoder Networks [0.0]
We introduce a novel and efficient method for Event Coreference Resolution (ECR) applied to a lower-resourced language domain.
By framing ECR as a graph reconstruction task, we are able to combine deep semantic embeddings with structural coreference chain knowledge.
Our method significantly outperforms classical mention-pair methods on a large Dutch event coreference corpus.
arXiv Detail & Related papers (2023-10-18T13:44:58Z) - Adaptive Gating in Mixture-of-Experts based Language Models [7.936874532105228]
Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models.
This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts.
arXiv Detail & Related papers (2023-10-11T04:30:18Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.