Metadata Shaping: Natural Language Annotations for the Tail
- URL: http://arxiv.org/abs/2110.08430v1
- Date: Sat, 16 Oct 2021 01:00:47 GMT
- Title: Metadata Shaping: Natural Language Annotations for the Tail
- Authors: Simran Arora, Sen Wu, Enci Liu, Christopher Re
- Abstract summary: Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns.
We propose metadata shaping, a method in which readily available metadata, such as entity descriptions and categorical tags, are appended to examples based on information theoretic metrics.
With no changes to the LM whatsoever, metadata shaping exceeds the BERT-baseline by up to 5.3 F1 points, and achieves or competes with state-of-the-art results.
- Score: 4.665656172490747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) have made remarkable progress, but still struggle to
generalize beyond the training data to rare linguistic patterns. Since rare
entities and facts are prevalent in the queries users submit to popular
applications such as search and personal assistant systems, improving the
ability of LMs to reliably capture knowledge over rare entities is a pressing
challenge studied in significant prior work. Noticing that existing approaches
primarily modify the LM architecture or introduce auxiliary objectives to
inject useful entity knowledge, we ask to what extent we could match the
quality of these architectures using a base LM architecture, and only changing
the data? We propose metadata shaping, a method in which readily available
metadata, such as entity descriptions and categorical tags, are appended to
examples based on information theoretic metrics. Intuitively, if metadata
corresponding to popular entities overlap with metadata for rare entities, the
LM may be able to better reason about the rare entities using patterns learned
from similar popular entities. On standard entity-rich tasks (TACRED, FewRel,
OpenEntity), with no changes to the LM whatsoever, metadata shaping exceeds the
BERT-baseline by up to 5.3 F1 points, and achieves or competes with
state-of-the-art results. We further show the improvements are up to 10x larger
on examples containing tail versus popular entities.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Utilising a Large Language Model to Annotate Subject Metadata: A Case
Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research.
As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them.
This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Multi-Modal Fusion by Meta-Initialization [0.0]
We propose an extension to the Model-Agnostic Meta-Learning algorithm (MAML)
This allows the model to adapt using auxiliary information as well as task experience.
FuMI significantly outperforms uni-modal baselines such as MAML in the few-shot regime.
arXiv Detail & Related papers (2022-10-10T17:00:58Z) - A Multi-Format Transfer Learning Model for Event Argument Extraction via
Variational Information Bottleneck [68.61583160269664]
Event argument extraction (EAE) aims to extract arguments with given roles from texts.
We propose a multi-format transfer learning model with variational information bottleneck.
We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.
arXiv Detail & Related papers (2022-08-27T13:52:01Z) - Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated.
We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained.
We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z) - Learning to Generalize Unseen Domains via Memory-based Multi-Source
Meta-Learning for Person Re-Identification [59.326456778057384]
We propose the Memory-based Multi-Source Meta-Learning framework to train a generalizable model for unseen domains.
We also present a meta batch normalization layer (MetaBN) to diversify meta-test features.
Experiments demonstrate that our M$3$L can effectively enhance the generalization ability of the model for unseen domains.
arXiv Detail & Related papers (2020-12-01T11:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.