Automated Concatenation of Embeddings for Structured Prediction
- URL: http://arxiv.org/abs/2010.05006v4
- Date: Tue, 1 Jun 2021 13:23:25 GMT
- Title: Automated Concatenation of Embeddings for Structured Prediction
- Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei
Huang, Kewei Tu
- Abstract summary: We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
- Score: 75.44925576268052
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Pretrained contextualized embeddings are powerful word representations for
structured prediction tasks. Recent work found that better word representations
can be obtained by concatenating different types of embeddings. However, the
selection of embeddings to form the best concatenated representation usually
varies depending on the task and the collection of candidate embeddings, and
the ever-increasing number of embedding types makes it a more difficult
problem. In this paper, we propose Automated Concatenation of Embeddings (ACE)
to automate the process of finding better concatenations of embeddings for
structured prediction tasks, based on a formulation inspired by recent progress
on neural architecture search. Specifically, a controller alternately samples a
concatenation of embeddings, according to its current belief of the
effectiveness of individual embedding types in consideration for a task, and
updates the belief based on a reward. We follow strategies in reinforcement
learning to optimize the parameters of the controller and compute the reward
based on the accuracy of a task model, which is fed with the sampled
concatenation as input and trained on a task dataset. Empirical results on 6
tasks and 21 datasets show that our approach outperforms strong baselines and
achieves state-of-the-art performance with fine-tuned embeddings in all the
evaluations.
Related papers
- Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions [1.2425910171551517]
Question-answering (QA) is an important application of Information Retrieval (IR) and language models.
We propose an innovative approach to improve QA task performances by integrating optimized vector retrievals and instruction methodologies.
arXiv Detail & Related papers (2024-11-01T21:14:04Z) - Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [12.37229805276939]
We propose a novel pre-training method called Duplex Masked Auto-Encoder, a.k.a. DupMAE.
It is designed to improve the quality semantic representation where all contextualized embeddings of the pretrained model can be leveraged.
arXiv Detail & Related papers (2023-05-04T05:37:22Z) - Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models [48.77653835765705]
We introduce a probabilistic resolution to prompt tuning, where the label-specific prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model.
We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts.
arXiv Detail & Related papers (2023-03-16T06:09:15Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Meeting Summarization with Pre-training and Clustering Methods [6.47783315109491]
HMNetcitehmnet is a hierarchical network that employs both a word-level transformer and a turn-level transformer, as the baseline.
We extend the locate-then-summarize approach of QMSumciteqmsum with an intermediate clustering step.
We compare the performance of our baseline models with BART, a state-of-the-art language model that is effective for summarization.
arXiv Detail & Related papers (2021-11-16T03:14:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.