Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language
Conversion for Language Models
- URL: http://arxiv.org/abs/2401.11725v2
- Date: Tue, 12 Mar 2024 15:48:17 GMT
- Title: Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language
Conversion for Language Models
- Authors: Yile Wang, Sijie Cheng, Zixin Sun, Peng Li, Yang Liu
- Abstract summary: Symbols play important roles in various tasks such as abstract reasoning, chemical property prediction, and table question answering.
Despite impressive natural language comprehension capabilities, large language models' reasoning abilities for symbols remain inadequate.
We propose symbol-to-language (S2L), a tuning-free method that enables large language models to solve symbol-related problems with information expressed in natural language.
- Score: 16.265409100706584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbols (or more broadly, non-natural language textual representations) such
as numerical sequences, molecular formulas, and table delimiters widely exist,
playing important roles in various tasks such as abstract reasoning, chemical
property prediction, and table question answering. Despite the impressive
natural language comprehension capabilities of large language models (LLMs),
their reasoning abilities for symbols remain inadequate, which could attributed
to the difference between symbol representations and general natural languages.
We propose symbol-to-language (S2L), a tuning-free method that enables large
language models to solve symbol-related problems with information expressed in
natural language. Specifically, S2L first converts the symbols involved to
language-based representations, which can be implemented by prompting LLMs or
leveraging external tools, then these language-based representations are
integrated into the original problem via direct substitution or concatenation,
serving as useful input information for LLMs. We evaluate the S2L method using
both API-based (GPT-4, ChatGPT) and open-source (OpenChat) models over eight
symbol-related tasks, ranging from symbol-only abstract reasoning to sentiment
analysis in social media. Experimental results show that S2L consistently leads
to superior performance. For example, by employing S2L for GPT-4, there can be
average significant improvements of +21.9% and +9.5% for subtasks in 1D-ARC and
Dyck language, respectively. Codes and data are available at
https://github.com/THUNLP-MT/symbol2language.
Related papers
- Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages [15.203789021094982]
In large language models (LLMs), how are multiple languages learned and encoded?
We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages.
arXiv Detail & Related papers (2025-01-10T21:18:21Z) - MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models [13.136507215114722]
MLLM-SR is a conversational symbolic regression method that can generate expressions that meet the requirements simply by describing the requirements with natural language instructions.
We experimentally demonstrate that MLLM-SR can well understand the prior knowledge we add to the natural language instructions.
arXiv Detail & Related papers (2024-06-08T09:17:54Z) - SignLLM: Sign Language Production Large Language Models [31.036549195000667]
We propose SignLLM, a multilingual Sign Language Production (SLP) large language model.
It includes two novel multilingual SLP modes MLSF and Prompt2LangGloss that allow sign language gestures generation.
For SignLLM's training, we introduce Prompt2Sign, a comprehensive multilingual sign language dataset.
arXiv Detail & Related papers (2024-05-17T12:01:43Z) - MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling [70.34758460372629]
We introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages.
MYTE produces shorter encodings for all 99 analyzed languages.
This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.
arXiv Detail & Related papers (2024-03-15T21:21:11Z) - Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human
Activity Reasoning [58.5857133154749]
We propose a new symbolic system with broad-coverage symbols and rational rules.
We leverage the recent advancement of LLMs as an approximation of the two ideal properties.
Our method shows superiority in extensive activity understanding tasks.
arXiv Detail & Related papers (2023-11-29T05:27:14Z) - Symbol-LLM: Towards Foundational Symbol-centric Interface For Large
Language Models [41.91490484827197]
Injecting a collection of symbolic data directly into the training of Large Language Models can be problematic.
In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models.
Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs)
Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages.
We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z) - The Geometry of Multilingual Language Model Representations [25.880639246639323]
We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language.
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information.
arXiv Detail & Related papers (2022-05-22T23:58:24Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.