Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of
In-Context Experts
- URL: http://arxiv.org/abs/2210.03690v1
- Date: Fri, 7 Oct 2022 16:51:45 GMT
- Title: Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of
In-Context Experts
- Authors: Nghia T. Le, Fan Bai, and Alan Ritter
- Abstract summary: We present MICE (Mixtures of In-Context Experts), which we demonstrate is effective for few-shot anaphora resolution in scientific protocols.
MICE combines the predictions of hundreds of in-context experts, yielding a 30% increase in F1 score over a competitive prompt retrieval baseline.
- Score: 9.642187680042657
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Anaphora resolution is an important task for information extraction across a
range of languages, text genres, and domains, motivating the need for methods
that do not require large annotated datasets. In-context learning has emerged
as a promising approach, yet there are a number of challenges in applying
in-context learning to resolve anaphora. For example, encoding a single
in-context demonstration that consists of: an anaphor, a paragraph-length
context, and a list of corresponding antecedents, requires conditioning a
language model on a long sequence of tokens, limiting the number of
demonstrations per prompt. In this paper, we present MICE (Mixtures of
In-Context Experts), which we demonstrate is effective for few-shot anaphora
resolution in scientific protocols (Tamari et al., 2021). Given only a handful
of training examples, MICE combines the predictions of hundreds of in-context
experts, yielding a 30% increase in F1 score over a competitive prompt
retrieval baseline. Furthermore, we show MICE can be used to train compact
student models without sacrificing performance. As far as we are aware, this is
the first work to present experimental results demonstrating the effectiveness
of in-context learning on the task of few-shot anaphora resolution in
scientific protocols.
Related papers
- A Large Encoder-Decoder Family of Foundation Models For Chemical Language [1.1073864511426255]
This paper introduces a large encoder-decoder chemical foundation models pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem.
Our experiments across multiple benchmark datasets validate the capacity of the proposed model in providing state-of-the-art results for different tasks.
arXiv Detail & Related papers (2024-07-24T20:30:39Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation
Extraction [15.553367375330843]
We propose a novel approach for few-shot relation extraction using large language models.
CoT-ER first induces large language models to generate evidences using task-specific and concept-level knowledge.
arXiv Detail & Related papers (2023-11-10T08:12:00Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples.
We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization.
With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Full-Text Argumentation Mining on Scientific Publications [3.8754200816873787]
We introduce a sequential pipeline model combining ADUR and ARE for full-text SAM.
We provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks.
Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges.
arXiv Detail & Related papers (2022-10-24T10:05:30Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z) - Document-Level Event Role Filler Extraction using Multi-Granularity
Contextualized Encoding [40.13163091122463]
Event extraction is a difficult task since it requires a view of a larger context to determine which spans of text correspond to event role fillers.
We first investigate how end-to-end neural sequence models perform on document-level role filler extraction.
We show that our best system performs substantially better than prior work.
arXiv Detail & Related papers (2020-05-13T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.