Representing Numbers in NLP: a Survey and a Vision
- URL: http://arxiv.org/abs/2103.13136v1
- Date: Wed, 24 Mar 2021 12:28:22 GMT
- Title: Representing Numbers in NLP: a Survey and a Vision
- Authors: Avijit Thawani, Jay Pujara, Pedro A. Szekely, Filip Ilievski
- Abstract summary: We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods.
We analyze the myriad representational choices made by 18 previously published number encoders and decoders.
We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP.
- Score: 15.035458171592191
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: NLP systems rarely give special consideration to numbers found in text. This
starkly contrasts with the consensus in neuroscience that, in the brain,
numbers are represented differently from words. We arrange recent NLP work on
numeracy into a comprehensive taxonomy of tasks and methods. We break down the
subjective notion of numeracy into 7 subtasks, arranged along two dimensions:
granularity (exact vs approximate) and units (abstract vs grounded). We analyze
the myriad representational choices made by 18 previously published number
encoders and decoders. We synthesize best practices for representing numbers in
text and articulate a vision for holistic numeracy in NLP, comprised of design
trade-offs and a unified evaluation.
Related papers
- Number Cookbook: Number Understanding of Language Models and How to Improve It [63.9542740221096]
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing.
This paper comprehensively investigates the numerical understanding and processing ability (NUPA) of LLMs.
arXiv Detail & Related papers (2024-11-06T08:59:44Z) - Laying Anchors: Semantically Priming Numerals in Language Modeling [11.831883526217942]
We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus.
We demonstrate significant improvements in the mathematical grounding of our learned embeddings.
arXiv Detail & Related papers (2024-04-02T00:02:00Z) - A Taxonomy of Ambiguity Types for NLP [53.10379645698917]
We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis.
Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.
arXiv Detail & Related papers (2024-03-21T01:47:22Z) - GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and
Linguistic Evaluation [15.886585212606787]
We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens.
GENTLE is manually annotated for a variety of popular NLP tasks.
We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks.
arXiv Detail & Related papers (2023-06-03T00:20:15Z) - Everyone's Voice Matters: Quantifying Annotation Disagreement Using
Demographic Information [11.227630261409706]
We study whether the text of a task and annotators' demographic background information can be used to estimate the level of disagreement among annotators.
Our results show that knowing annotators' demographic information, like gender, ethnicity, and education level, helps predict disagreements.
arXiv Detail & Related papers (2023-01-12T14:04:53Z) - Number Entity Recognition [65.80137628972312]
Numbers are essential components of text, like any other word tokens, from which natural language processing (NLP) models are built and deployed.
In this work, we attempt to tap this potential of state-of-the-art NLP models and transfer their ability to boost performance in related tasks.
Our proposed classification of numbers into entities helps NLP models perform well on several tasks, including a handcrafted Fill-In-The-Blank (FITB) task and on question answering using joint embeddings.
arXiv Detail & Related papers (2022-05-07T05:22:43Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z) - Graph Neural Networks for Natural Language Processing: A Survey [64.36633422999905]
We present a comprehensive overview onGraph Neural Networks (GNNs) for Natural Language Processing.
We propose a new taxonomy of GNNs for NLP, which organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models.
arXiv Detail & Related papers (2021-06-10T23:59:26Z) - A Cross-Task Analysis of Text Span Representations [52.28565379517174]
We find that the optimal span representation varies by task, and can also vary within different facets of individual tasks.
We also find that the choice of span representation has a bigger impact with a fixed pretrained encoder than with a fine-tuned encoder.
arXiv Detail & Related papers (2020-06-06T13:37:51Z) - Learning Numeral Embeddings [20.951228068643946]
Existing word embedding methods do not learn numeral embeddings well because there are an infinite number of numerals.
We propose two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals.
arXiv Detail & Related papers (2019-12-28T03:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.