Related papers: Probing for Multilingual Numerical Understanding in Transformer-Based Language Models

Probing for Multilingual Numerical Understanding in Transformer-Based Language Models

URL: http://arxiv.org/abs/2010.06666v1
Date: Tue, 13 Oct 2020 19:56:02 GMT
Title: Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
Authors: Devin Johnson, Denise Mak, Drew Barker, Lexi Loessberg-Zahl
Abstract summary: We propose novel probing tasks tested on DistilBERT, XLM, and BERT to investigate for evidence of compositional reasoning over numerical data in various natural language number systems. By using both grammaticality judgment and value comparison classification tasks in English, Japanese, Danish, and French, we find evidence that the information encoded in these pretrained models' embeddings is sufficient for grammaticality judgments but generally not for value comparisons.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Natural language numbers are an example of compositional structures, where larger numbers are composed of operations on smaller numbers. Given that compositional reasoning is a key to natural language understanding, we propose novel multilingual probing tasks tested on DistilBERT, XLM, and BERT to investigate for evidence of compositional reasoning over numerical data in various natural language number systems. By using both grammaticality judgment and value comparison classification tasks in English, Japanese, Danish, and French, we find evidence that the information encoded in these pretrained models' embeddings is sufficient for grammaticality judgments but generally not for value comparisons. We analyze possible reasons for this and discuss how our tasks could be extended in further studies.

Related papers

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles [8.820095911041637]
Large language models (LLMs) struggle with linguistic-mathematical puzzles involving cross-linguistic numeral systems.<n>We investigate why this task is difficult for LLMs through a series of experiments that untangle the linguistic and mathematical aspects of numbers in language.<n>We conclude that the ability to flexibly infer compositional rules from implicit patterns in human-scale data remains an open challenge for current reasoning models.
arXiv Detail & Related papers (2025-06-16T18:09:38Z)
Can Language Models Learn Typologically Implausible Languages? [62.823015163987996]
Grammatical features across human languages show intriguing correlations often attributed to learning biases in humans. We discuss how language models (LMs) allow us to better determine the role of domain-general learning biases in language universals. We test LMs on an array of highly naturalistic but counterfactual versions of the English (head-initial) and Japanese (head-final) languages.
arXiv Detail & Related papers (2025-02-17T20:40:01Z)
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models [19.47343987998194]
Large Language Models (LLMs) have demonstrated impressive capabilities in natural language processing tasks. Their performance on numerical reasoning tasks, such as basic arithmetic, numerical, and magnitude comparison, remains surprisingly poor. Existing benchmarks primarily focus on linguistic competence or structured mathematical problem-solving.
arXiv Detail & Related papers (2025-02-16T10:48:28Z)
Infusing Prompts with Syntax and Semantics [0.0]
We analyze the effect of directly infusing various kinds of syntactic and semantic information into large language models. We show that linguistic analysis can significantly boost language models, to the point that we have surpassed previous best systems.
arXiv Detail & Related papers (2024-12-08T23:49:38Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z)
Cross-Lingual Transfer of Cognitive Processing Complexity [11.939409227407769]
We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity. We show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages.
arXiv Detail & Related papers (2023-02-24T15:48:23Z)
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems [42.782260686177395]
We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models. We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models. We transform and formulate the task as an analytically solvable linear system.
arXiv Detail & Related papers (2022-10-11T00:57:19Z)
Compositional Evaluation on Japanese Textual Entailment and Similarity [20.864082353441685]
Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. There are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English.
arXiv Detail & Related papers (2022-08-09T15:10:56Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Do Language Embeddings Capture Scales? [54.1633257459927]
We show that pretrained language models capture a significant amount of information about the scalar magnitudes of objects. We identify contextual information in pre-training and numeracy as two key factors affecting their performance.
arXiv Detail & Related papers (2020-10-11T21:11:09Z)
Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models [27.91397366776451]
Training LSTMs on latent structure (MIDI music or Java code) improves test performance on natural language. Experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological similarity to the training language.
arXiv Detail & Related papers (2020-04-30T06:24:03Z)
Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.