Related papers: Metadata Shaping: Natural Language Annotations for the Tail

Metadata Shaping: Natural Language Annotations for the Tail

URL: http://arxiv.org/abs/2110.08430v1
Date: Sat, 16 Oct 2021 01:00:47 GMT
Title: Metadata Shaping: Natural Language Annotations for the Tail
Authors: Simran Arora, Sen Wu, Enci Liu, Christopher Re
Abstract summary: Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns. We propose metadata shaping, a method in which readily available metadata, such as entity descriptions and categorical tags, are appended to examples based on information theoretic metrics. With no changes to the LM whatsoever, metadata shaping exceeds the BERT-baseline by up to 5.3 F1 points, and achieves or competes with state-of-the-art results.
Score: 4.665656172490747
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns. Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work. Noticing that existing approaches primarily modify the LM architecture or introduce auxiliary objectives to inject useful entity knowledge, we ask to what extent we could match the quality of these architectures using a base LM architecture, and only changing the data? We propose metadata shaping, a method in which readily available metadata, such as entity descriptions and categorical tags, are appended to examples based on information theoretic metrics. Intuitively, if metadata corresponding to popular entities overlap with metadata for rare entities, the LM may be able to better reason about the rare entities using patterns learned from similar popular entities. On standard entity-rich tasks (TACRED, FewRel, OpenEntity), with no changes to the LM whatsoever, metadata shaping exceeds the BERT-baseline by up to 5.3 F1 points, and achieves or competes with state-of-the-art results. We further show the improvements are up to 10x larger on examples containing tail versus popular entities.

Related papers

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
Utilising a Large Language Model to Annotate Subject Metadata: A Case Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research. As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them. This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Multi-Modal Fusion by Meta-Initialization [0.0]
We propose an extension to the Model-Agnostic Meta-Learning algorithm (MAML) This allows the model to adapt using auxiliary information as well as task experience. FuMI significantly outperforms uni-modal baselines such as MAML in the few-shot regime.
arXiv Detail & Related papers (2022-10-10T17:00:58Z)
A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck [68.61583160269664]
Event argument extraction (EAE) aims to extract arguments with given roles from texts. We propose a multi-format transfer learning model with variational information bottleneck. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.
arXiv Detail & Related papers (2022-08-27T13:52:01Z)
Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z)
Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification [59.326456778057384]
We propose the Memory-based Multi-Source Meta-Learning framework to train a generalizable model for unseen domains. We also present a meta batch normalization layer (MetaBN) to diversify meta-test features. Experiments demonstrate that our M$3$L can effectively enhance the generalization ability of the model for unseen domains.
arXiv Detail & Related papers (2020-12-01T11:38:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.