Related papers: Efficient and Interpretable Neural Models for Entity Tracking

Efficient and Interpretable Neural Models for Entity Tracking

URL: http://arxiv.org/abs/2208.14252v1
Date: Tue, 30 Aug 2022 13:25:27 GMT
Title: Efficient and Interpretable Neural Models for Entity Tracking
Authors: Shubham Toshniwal
Abstract summary: This thesis focuses on two key problems in relation to facilitating the use of entity tracking models. We argue that computationally efficient entity tracking models can be developed by representing entities with rich, fixed-dimensional vector representations. We also argue for the integration of entity tracking into language models as it will allow for: (i) wider application given the current ubiquitous use of pretrained language models in NLP applications.
Score: 3.1985066117432934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What would it take for a natural language model to understand a novel, such as The Lord of the Rings? Among other things, such a model must be able to: (a) identify and record new characters (entities) and their attributes as they are introduced in the text, and (b) identify subsequent references to the characters previously introduced and update their attributes. This problem of entity tracking is essential for language understanding, and thus, useful for a wide array of downstream applications in NLP such as question-answering, summarization. In this thesis, we focus on two key problems in relation to facilitating the use of entity tracking models: (i) scaling entity tracking models to long documents, such as a novel, and (ii) integrating entity tracking into language models. Applying language technologies to long documents has garnered interest recently, but computational constraints are a significant bottleneck in scaling up current methods. In this thesis, we argue that computationally efficient entity tracking models can be developed by representing entities with rich, fixed-dimensional vector representations derived from pretrained language models, and by exploiting the ephemeral nature of entities. We also argue for the integration of entity tracking into language models as it will allow for: (i) wider application given the current ubiquitous use of pretrained language models in NLP applications, and (ii) easier adoption since it is much easier to swap in a new pretrained language model than to integrate a separate standalone entity tracking model.

Related papers

Chunk-Distilled Language Modeling [25.238256586953487]
Chunk-Distilled Language Modeling (CD-LM) is an approach to text generation that addresses two challenges in current large language models (LLMs) Our method combines deep network-based LLMs with a straightforward retrieval module, which allows the generation of multi-token text chunks at a single decoding step.
arXiv Detail & Related papers (2024-12-31T08:32:15Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z)
Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction [7.077412533545456]
We propose a new pre-trained model that learns representations of both entities and relationships simultaneously. By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models.
arXiv Detail & Related papers (2022-08-20T07:32:25Z)
Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction [80.61458287741131]
We present a study on leveraging multilingual pre-trained generative language models for zero-shot cross-lingual event argument extraction (EAE) By formulating EAE as a language generation task, our method effectively encodes event structures and captures the dependencies between arguments. Our proposed model finetunes multilingual pre-trained generative language models to generate sentences that fill in the language-agnostic template with arguments extracted from the input passage.
arXiv Detail & Related papers (2022-03-15T23:00:32Z)
mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models [15.873069955407406]
We train a multilingual language model with 24 languages with entity representations. We show the model consistently outperforms word-based pretrained models in various cross-lingual transfer tasks. We also evaluate the model with a multilingual cloze prompt task with the mLAMA dataset.
arXiv Detail & Related papers (2021-10-15T15:28:38Z)
Towards Continual Entity Learning in Language Models for Conversational Agents [0.5330240017302621]
We introduce entity-aware language models (EALM), where we integrate entity models trained on catalogues of entities into pre-trained LMs. Our combined language model adaptively adds information from the entity models into the pre-trained LM depending on the sentence context. We show significant perplexity improvements on task-oriented dialogue datasets, especially on long-tailed utterances.
arXiv Detail & Related papers (2021-07-30T21:10:09Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types. We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.